Image reconstruction method, system, device and computer readable storage medium

文档序号：1159343 发布日期：2020-09-15 浏览：17次中文

阅读说明：本技术 图像重建方法、系统、设备及计算机可读存储介质 (Image reconstruction method, system, device and computer readable storage medium ) 是由盛骁杰于 2019-03-07 设计创作，主要内容包括：一种图像重建方法、系统、设备及计算机可读存储介质,所述方法包括：获取多角度自由视角的图像组合、所述图像组合的参数数据以及基于用户交互的虚拟视点位置信息,其中,所述图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图；根据所述虚拟视点位置信息及所述图像组合的参数数据,按照预设规则选择用户交互时刻图像组合中相应组的纹理图和深度图；基于所述虚拟视点位置信息及用户交互时刻图像组合中相应组的纹理图和深度图对应的参数数据,将选择的用户交互时刻图像组合中相应组的纹理图和深度图进行组合渲染,得到所述用户交互时刻虚拟视点位置对应的重建图像。采用上述方案可以减小图像重建过程中数据运算量。(An image reconstruction method, system, device and computer-readable storage medium, the method comprising: acquiring an image combination of a multi-angle free visual angle, parameter data of the image combination and virtual viewpoint position information based on user interaction, wherein the image combination comprises a plurality of angle-synchronous texture maps and depth maps with corresponding relations; selecting a texture map and a depth map of a corresponding group in the image combination at the moment of user interaction according to the virtual viewpoint position information and the parameter data of the image combination and a preset rule; and performing combined rendering on the texture map and the depth map of the corresponding group in the selected user interaction time image combination based on the virtual viewpoint position information and the parameter data corresponding to the texture map and the depth map of the corresponding group in the user interaction time image combination to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction time. By adopting the scheme, the data operation amount in the image reconstruction process can be reduced.)

1. An image reconstruction method, comprising:

acquiring an image combination of a multi-angle free visual angle, parameter data of the image combination and virtual viewpoint position information based on user interaction, wherein the image combination comprises a plurality of angle-synchronous texture maps and depth maps with corresponding relations;

selecting a texture map and a depth map of a corresponding group in the image combination at the moment of user interaction according to the virtual viewpoint position information and the parameter data of the image combination and a preset rule;

and performing combined rendering on the texture map and the depth map of the corresponding group in the selected user interaction time image combination based on the virtual viewpoint position information and the parameter data corresponding to the texture map and the depth map of the corresponding group in the user interaction time image combination to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction time.

2. The image reconstruction method according to claim 1, wherein selecting a corresponding set of texture map data and depth map data in the image combination at the time of user interaction according to a preset rule based on the virtual viewpoint position information and the parameter data of the image combination comprises:

and selecting a texture map and a depth map of a corresponding group which satisfies a preset position relation and/or a quantity relation with the virtual viewpoint position in the image combination at the user interaction moment according to the virtual viewpoint position information and the parameter data of the image combination.

3. The image reconstruction method according to claim 2, wherein the selecting, according to the virtual viewpoint position information and the parameter data of the image combination, the texture map and the depth map of the corresponding group of the image combination whose virtual viewpoint position at the time of the user interaction satisfies a preset positional relationship and/or a number relationship includes:

and selecting a preset number of corresponding groups of texture maps and depth maps closest to the virtual viewpoint position in the image combination at the user interaction moment according to the virtual viewpoint position information and the parameter data of the image combination.

4. The image reconstruction method according to claim 3, wherein the selecting a preset number of texture maps and depth maps of a corresponding group closest to the virtual viewpoint position in the image combination at the time of the user interaction according to the virtual viewpoint position information and the parameter data of the image combination comprises:

and selecting texture maps and depth maps corresponding to 2 to N acquisition devices closest to the virtual viewpoint position according to the virtual viewpoint position information and parameter data corresponding to texture maps and depth maps of corresponding groups in the image combination at the user interaction time, wherein N is the number of all the acquisition devices for acquiring the image combination.

5. The image reconstruction method according to claim 1, wherein the combining and rendering the texture map data and the depth map data of the corresponding group in the selected user interaction time image combination based on the virtual viewpoint position information and the parameter data corresponding to the texture map and the depth map of the corresponding group in the user interaction time image combination to obtain the reconstructed image corresponding to the virtual viewpoint position at the user interaction time comprises:

respectively carrying out forward mapping on the depth maps of corresponding groups in the selected user interaction time image combination, and mapping to the virtual position of the user interaction time;

respectively carrying out post-processing on the depth maps after the forward mapping;

respectively carrying out reverse mapping on texture maps of corresponding groups in the selected user interaction moment image combination;

and fusing the virtual texture maps generated after the reverse mapping.

6. The image reconstruction method according to claim 5, further comprising, after fusing the virtual texture maps generated by the inverse mapping:

and filling the hole in the fused texture map to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction moment.

7. The image reconstruction method according to claim 5, wherein the post-processing of the forward-mapped depth maps comprises at least one of:

respectively carrying out foreground edge protection processing on the depth map after forward mapping;

and respectively carrying out pixel-level filtering processing on the depth map subjected to the forward mapping.

8. The image reconstruction method according to claim 5, wherein the fusing the virtual texture maps generated by inverse mapping comprises:

and according to the virtual viewpoint position information and the parameter data corresponding to the texture map and the depth map of the corresponding group in the image combination at the user interaction time, fusing all the virtual texture maps generated after reverse mapping by adopting the global weight determined by the distance between the position of the virtual viewpoint and the position of the acquisition equipment for acquiring the corresponding texture map in the image combination.

9. The image reconstruction method according to claim 1, wherein the combining and rendering the texture map data and the depth map data of the corresponding group in the selected user interaction time image combination based on the virtual viewpoint position information and the parameter data corresponding to the texture map and the depth map of the corresponding group in the user interaction time image combination to obtain the reconstructed image corresponding to the virtual viewpoint position at the user interaction time comprises:

mapping the depth maps of corresponding groups in the user interaction moment image combination to virtual viewpoint positions of user interaction moments according to a space geometric relationship to form a virtual viewpoint position depth map, copying pixel points in texture maps of corresponding groups in the user interaction moment image combination to a virtual texture map corresponding to the generated virtual viewpoint positions according to the mapped depth map, and forming a virtual texture map corresponding to the corresponding groups in the user interaction moment image combination;

and fusing the corresponding virtual texture maps of the corresponding groups in the user interaction moment image combination to obtain a reconstructed image of the user interaction moment virtual viewpoint position.

10. The image reconstruction method according to claim 9, wherein the fusing the virtual texture maps corresponding to the respective groups in the image combination at the user interaction time to obtain the reconstructed image at the virtual viewpoint position at the user interaction time includes:

weighting pixels at corresponding positions in the virtual texture maps corresponding to each corresponding group in the user interaction moment image combination to obtain pixel values at corresponding positions in the reconstructed image at the virtual viewpoint position at the user interaction moment;

and for the position with the zero pixel value in the reconstructed image of the virtual viewpoint position at the user interaction moment, filling up the hole by using the pixels around the pixel in the reconstructed image to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.

11. The image reconstruction method according to claim 9, wherein the fusing the virtual texture maps corresponding to the respective groups in the image combination at the user interaction time to obtain the reconstructed image of the virtual viewpoint position at the user interaction time includes:

for the position where the pixel value in the virtual texture image corresponding to each corresponding group in the image combination at the user interaction moment is zero, respectively filling up the hole by using the surrounding pixel values;

and weighting the pixel values of the corresponding positions in the virtual texture maps corresponding to the corresponding groups after the holes are filled to obtain a reconstructed image of the virtual viewpoint position at the user interaction time.

12. The image reconstruction method according to claim 1, wherein said obtaining a combination of images from a plurality of angles, and parameter data of the combination of images, comprises:

and decoding the acquired image compression data of the multi-angle free visual angle to obtain an image combination of the multi-angle free visual angle, wherein the image combination corresponds to the parameter data.

13. An image reconstruction system, comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is suitable for acquiring an image combination of a multi-angle free visual angle, parameter data of the image combination and virtual viewpoint position information based on user interaction, wherein the image combination comprises a plurality of angle synchronous texture maps and depth maps with corresponding relations;

the selection unit is suitable for selecting a texture map and a depth map of a corresponding group in the image combination at the moment of user interaction according to the virtual viewpoint position information and the parameter data of the image combination and a preset rule;

and the image reconstruction unit is suitable for performing combined rendering on the texture map and the depth map of the corresponding group in the selected user interaction time image combination based on the virtual viewpoint position information and the parameter data corresponding to the texture map and the depth map of the corresponding group in the user interaction time image combination to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction time.

14. An image reconstruction device comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the method of any one of claims 1 to 12.

15. The image reconstruction apparatus of claim 14, wherein the image reconstruction apparatus comprises at least one of: terminal equipment, edge node.

16. A computer readable storage medium having computer instructions stored thereon, wherein the computer instructions when executed perform the steps of the method of any one of claims 1 to 12.

Technical Field

Embodiments of the present invention relate to the field of image processing technologies, and in particular, to an image reconstruction method, system, device, and computer-readable storage medium.

Background

With the continuous development of interconnection technology, more and more video platforms continuously improve the visual experience of users by providing videos with higher definition or smoothness for watching.

However, for videos with a strong experience sense in the field, such as images of a basketball game, a user often can only watch the game through one viewpoint position in the watching process, and cannot freely switch the viewpoint position by himself or herself to watch the game pictures or the game process at different viewpoint positions.

The 6Degree of Freedom (6 DoF) technology is a technology for providing a high Degree of Freedom viewing experience, and a user can adjust a viewing angle by an interactive means during viewing, so that the user can view the image from a desired free viewpoint, thereby greatly improving the viewing experience. The 6DoF image is at any wonderful moment in the watching process, and the user can select to pause the video, so that the user can stay at the wonderful moment to perform the free visual angle experience of the 6 DoF.

In order to realize the 6DoF image, a Free-D playback technology and a light field rendering technology exist at present, the Free-D playback technology expresses the 6DoF image through point cloud, and the point cloud expresses and stores three-dimensional positions and pixel information of all points in space. The light field rendering technology is characterized in that under the condition that the depth information or the correlation of an image is not needed, a light field database of a scene is established through a group of scene photos shot in advance, then, for any given new viewpoint, the view of the viewpoint is obtained through resampling and bilinear interpolation operation, and the roaming of the whole scene is realized.

However, both Free-D playback techniques and light-field rendering techniques require a very large amount of data computation. In addition, the point cloud compression does not have good standard and industrial software and hardware support at present, so that the point cloud compression is not beneficial to popularization.

Disclosure of Invention

In view of this, embodiments of the present invention provide an image reconstruction method, system, device and computer readable storage medium to reduce the amount of data operations in the multi-degree-of-freedom image reconstruction process.

In one aspect, an embodiment of the present invention provides an image reconstruction method, where the method includes: acquiring an image combination of a multi-angle free visual angle, parameter data of the image combination and virtual viewpoint position information based on user interaction, wherein the image combination comprises a plurality of angle-synchronous texture maps and depth maps with corresponding relations; selecting a texture map and a depth map of a corresponding group in the image combination at the moment of user interaction according to a preset rule according to the virtual viewpoint position information and the parameter data of the image combination; and performing combined rendering on the texture map and the depth map of the corresponding group in the selected user interaction moment image combination based on the virtual viewpoint position information and the parameter data corresponding to the texture map and the depth map of the corresponding group in the user interaction moment image combination to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction moment.

Optionally, the selecting, according to the virtual viewpoint position information and the parameter data of the image combination, texture map data and depth map data of a corresponding group in the image combination at the user interaction time according to a preset rule includes: and selecting a texture map and a depth map of a corresponding group which satisfies a preset position relation and/or a quantity relation with the virtual viewpoint position in the image combination at the user interaction moment according to the virtual viewpoint position information and the parameter data of the image combination.

Optionally, the selecting, according to the virtual viewpoint position information and the parameter data of the image combination, a texture map and a depth map of a corresponding group of the image combination whose virtual viewpoint position satisfies a preset position relationship and/or a quantity relationship with a user interaction time includes: and selecting a preset number of texture maps and depth maps of corresponding groups closest to the virtual viewpoint position in the image combination at the user interaction moment according to the virtual viewpoint position information and the parameter data of the image combination.

Optionally, the combining and rendering the texture map data and the depth map data of the corresponding group in the selected user interaction time image combination based on the virtual viewpoint position information and the parameter data corresponding to the texture map and the depth map of the corresponding group in the user interaction time image combination to obtain the reconstructed image corresponding to the user interaction time virtual viewpoint position includes: respectively mapping the depth maps of corresponding groups in the user interaction moment image combination to virtual viewpoint positions of user interaction moments according to a space geometric relationship to form a virtual viewpoint position depth map, copying pixel points in texture maps of corresponding groups in the user interaction moment image combination to virtual texture maps corresponding to the generated virtual viewpoint positions according to the mapped depth map, and forming virtual texture maps corresponding to the corresponding groups in the user interaction moment image combination; and fusing the virtual texture maps corresponding to the corresponding groups in the image combination at the user interaction time.

Optionally, the fusing the virtual texture maps corresponding to the corresponding groups in the user interaction time image combination to obtain a reconstructed image of the user interaction time virtual viewpoint position includes: weighting pixels at corresponding positions in the virtual texture maps corresponding to each corresponding group in the user interaction moment image combination to obtain pixel values at corresponding positions in the reconstructed image of the virtual viewpoint position at the user interaction moment; and for the position with the pixel value being zero in the reconstructed image of the virtual viewpoint position at the user interaction moment, filling up the hole by using the pixels around the pixel in the reconstructed image to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.

Optionally, the fusing the virtual texture maps corresponding to the corresponding groups in the user interaction time image combination to obtain the reconstructed image of the virtual viewpoint position at the user interaction time includes: filling holes by using surrounding pixel values respectively for positions with zero pixel values in the virtual texture image corresponding to each corresponding group in the user interaction time image combination; and weighting the pixel values of the corresponding positions in the virtual texture maps corresponding to the corresponding groups after the holes are filled to obtain a reconstructed image of the virtual viewpoint position at the user interaction time.

In another aspect, an embodiment of the present invention provides an image reconstruction system, where the system includes: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is suitable for acquiring an image combination of a multi-angle free visual angle, parameter data of the image combination and virtual viewpoint position information based on user interaction, wherein the image combination comprises a plurality of angle-synchronous groups of texture maps and depth maps with corresponding relations; the selection unit is suitable for selecting a texture map and a depth map of a corresponding group in the image combination at the moment of user interaction according to the virtual viewpoint position information and the parameter data of the image combination and a preset rule; and the image reconstruction unit is suitable for performing combined rendering on the texture map and the depth map of the corresponding group in the selected user interaction time image combination based on the virtual viewpoint position information and the parameter data corresponding to the texture map and the depth map of the corresponding group in the user interaction time image combination to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction time.

The embodiment of the present invention further provides an image reconstruction apparatus, which includes a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor executes the computer instructions to perform the steps of the image reconstruction method according to any one of the above embodiments.

Optionally, the image reconstruction device comprises at least one of: terminal equipment, edge node.

By adopting the image reconstruction method in the embodiment of the invention, the texture map and the depth map of the corresponding group in the selected image combination at the user interaction time are combined and rendered without reconstructing the image based on the texture maps and the depth maps of all the groups in the image combination by acquiring the image combination of multi-angle free visual angles, the parameter data of the image combination and the virtual viewpoint position information based on user interaction, wherein the image combination comprises a plurality of groups of texture maps and depth maps which have corresponding relations and are synchronous in a plurality of angles, and the texture map and the depth map of the corresponding group in the image combination at the user interaction time are selected according to the virtual viewpoint position information and the parameter data of the image combination and the preset rule, and then the texture map and the depth map of the corresponding group in the image combination at the user interaction time are combined and rendered only based on the virtual viewpoint position information and the parameter data corresponding to the texture map and the depth map of the corresponding group in, and thus the amount of data computation in the image reconstruction process can be reduced.

Further, according to the virtual viewpoint position information and the parameter data of the image combination, a texture map and a depth map of a corresponding group, meeting a preset position relation and/or a quantity relation with the virtual viewpoint position, in the image combination at the user interaction moment are selected, so that high selection freedom and flexibility can be provided under the conditions of reducing data operation amount and guaranteeing reconstructed image quality, in addition, the installation requirement of shooting equipment for collecting images is reduced, and the method is convenient to adapt to different site requirements and installation operability.

Furthermore, according to the virtual viewpoint position information and the parameter data of the image combination, a preset number of corresponding texture maps and depth maps closest to the virtual viewpoint position in the image combination at the user interaction time are selected, so that the data computation amount can be reduced, and the quality of a reconstructed image can be ensured.

Further, according to the virtual viewpoint position information and parameter data corresponding to texture maps and depth maps of corresponding groups in the image combination at the user interaction time, texture maps and corresponding depth maps corresponding to 2-N acquisition devices closest to the virtual viewpoint position are selected, wherein N is the number of all acquisition devices for acquiring the image combination, so that the texture maps and the corresponding depth maps acquired by a plurality of acquisition devices closest to the virtual viewpoint position can be selected as required, reconstructed images meeting the definition requirement can be obtained by using as little data as possible, and transmission resources can be saved.

Further, forward mapping is carried out on depth maps of corresponding groups in the selected image combination at the user interaction time, the depth maps are mapped to virtual positions of the user interaction time, post-processing is carried out on the depth maps after forward mapping, then texture maps of corresponding groups in the selected image combination at the user interaction time are subjected to reverse mapping, virtual texture maps generated after reverse mapping are fused, and a reconstructed image of the image combination at the user interaction time is obtained after the processing.

Further, the fused texture map is subjected to hole filling to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction time, so that the quality of the reconstructed image can be improved.

Further, in the process of performing post-processing on the depth map after the forward mapping, the quality of the reconstructed depth map can be improved by performing foreground edge protection processing on the depth map after the forward mapping respectively and/or performing pixel-level filtering processing and the like on the depth map after the forward mapping respectively.

Furthermore, according to the virtual viewpoint position information and parameter data corresponding to the texture map and the depth map of the corresponding group in the image combination at the user interaction time, the virtual texture maps generated after reverse mapping are fused by adopting the global weight determined by the distance between the position of the virtual viewpoint and the position of the acquisition equipment for acquiring the corresponding texture map in the image combination, so that the reconstructed image is more real, and the quality of the reconstructed image is further improved.

Furthermore, the method of the embodiment of the invention is adopted to reconstruct the image after decoding the image compression data of the multi-angle free visual angle, thereby further saving network transmission resources, and also adopting a general compression mode and compressing software and hardware, thereby being popularized and popularized.

Furthermore, the image reconstruction scheme in the embodiment of the invention is applied to equipment such as terminal equipment and edge node equipment, can adapt to equipment with limited computing capability such as the terminal equipment and the edge node equipment, meets the requirement of a user on watching an image based on a virtual viewpoint, and improves the visual experience of the user.

Drawings

FIG. 1 is a schematic diagram of a region to be viewed according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an arrangement of a collecting apparatus according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a multi-angle free-view display system according to an embodiment of the present invention;

FIG. 4 is a schematic illustration of a display of an apparatus in an embodiment of the invention;

FIG. 5 is a schematic diagram of an apparatus according to an embodiment of the present invention;

FIG. 6 is a schematic illustration of another manipulation of the apparatus in an embodiment of the present invention;

FIG. 7 is a schematic diagram of an arrangement of a collecting apparatus according to an embodiment of the present invention;

FIG. 8 is a schematic illustration of another manipulation of the apparatus in an embodiment of the present invention;

FIG. 9 is a schematic illustration of a display of another apparatus in an embodiment of the invention;

FIG. 10 is a flow chart of a method for setting up a collection device according to an embodiment of the present invention;

FIG. 11 is a diagram illustrating a multi-angle free viewing range according to an embodiment of the present invention;

FIG. 12 is a diagram illustrating another multi-angle free viewing range in an embodiment of the present invention;

FIG. 13 is a diagram illustrating another multi-angle free viewing range in an embodiment of the present invention;

FIG. 14 is a diagram illustrating another multi-angle free view range according to an embodiment of the present invention;

FIG. 15 is a diagram illustrating another multi-angle free viewing range in an embodiment of the present invention;

FIG. 16 is a schematic diagram of another arrangement of the acquisition equipment in the embodiment of the invention;

FIG. 17 is a schematic diagram of another arrangement of the collecting apparatus in the embodiment of the present invention;

FIG. 18 is a schematic diagram of another arrangement of the acquisition equipment in the embodiment of the invention;

FIG. 19 is a flowchart of a multi-angle freeview data generating method according to an embodiment of the present invention;

FIG. 20 is a diagram illustrating distribution positions of pixel data and depth data of a single image according to an embodiment of the present invention;

FIG. 21 is a diagram illustrating distribution positions of pixel data and depth data of another single image according to an embodiment of the present invention;

FIG. 22 is a diagram illustrating distribution positions of pixel data and depth data of an image according to an embodiment of the present invention;

FIG. 23 is a diagram illustrating distribution positions of pixel data and depth data of another image according to an embodiment of the present invention;

FIG. 24 is a diagram illustrating distribution positions of pixel data and depth data of another image according to an embodiment of the present invention;

FIG. 25 is a diagram illustrating distribution positions of pixel data and depth data of another image according to an embodiment of the present invention;

FIG. 26 is a schematic illustration of image region stitching according to an embodiment of the present invention;

FIG. 27 is a schematic structural diagram of a stitched image in an embodiment of the present invention;

FIG. 28 is a schematic structural diagram of another stitched image in an embodiment of the present invention;

FIG. 29 is a schematic structural diagram of another stitched image in an embodiment of the present invention;

FIG. 30 is a schematic structural diagram of another stitched image in an embodiment of the present invention;

FIG. 31 is a schematic structural diagram of another stitched image in an embodiment of the present invention;

FIG. 32 is a schematic structural diagram of another stitched image in an embodiment of the present invention;

FIG. 33 is a diagram illustrating a pixel data distribution of an image according to an embodiment of the present invention;

FIG. 34 is a schematic diagram of a pixel data distribution of another image in an embodiment of the invention;

FIG. 35 is a diagram illustrating data storage in a stitched image, in accordance with an embodiment of the present invention;

FIG. 36 is a schematic illustration of data storage in another stitched image in an embodiment of the present invention;

FIG. 37 is a flow chart of a method of image reconstruction in an embodiment of the present invention;

FIG. 38 is a schematic diagram of an image reconstruction system according to an embodiment of the present invention;

FIG. 39 is a diagram illustrating a multi-angle freeview data generating process according to an embodiment of the present invention;

FIG. 40 is a schematic diagram of a multi-camera 6DoF acquisition system in an embodiment of the present invention;

FIG. 41 is a diagram illustrating the generation and processing of 6DoF video data according to an embodiment of the present invention;

FIG. 42 is a diagram illustrating a structure of a header file according to an embodiment of the present invention;

FIG. 43 is a diagram illustrating a user-side processing of 6DoF video data according to an embodiment of the present invention;

FIG. 44 is a schematic input and output diagram of an image reconstruction system in an embodiment of the present invention;

fig. 45 is a schematic diagram of an implementation architecture of an image reconstruction method according to an embodiment of the present invention.

Detailed Description

As described above, a large amount of data calculation is required to realize a multi-degree-of-freedom image. For example, in a method of expressing an image with multiple degrees of freedom by using a point cloud, since the point cloud expresses and stores three-dimensional positions and pixel information of all points in space, a very large amount of memory is required, and accordingly, a very large amount of data calculation is required in an image reconstruction process. If the image reconstruction method is adopted to reconstruct the image at the cloud end, very large processing pressure is caused to the cloud end reconstruction device, and if the image reconstruction is carried out at the terminal, the processing capacity of the terminal is limited, so that the large data volume is difficult to process. In addition, at present, the point cloud compression has no good standard and industrial software and hardware support, so that the popularization is not facilitated.

In view of the above technical problems, embodiments of the present invention provide a method for generating a multi-angle free view image by acquiring a multi-angle free view image combination, parameter data of the image combination, and virtual viewpoint position information based on user interaction, the image combination comprises a plurality of angle synchronous groups of texture maps and depth maps with corresponding relations, and then, selecting a texture map and a depth map of a corresponding group in the image combination at the user interaction time according to the virtual viewpoint position information and the parameter data of the image combination and a preset rule, and based on the virtual viewpoint position and the parameter data corresponding to the texture map and the depth map of the corresponding group in the user interaction time image combination, performing combined rendering on the texture map and the depth map of the corresponding group in the selected user interaction time image combination, so as to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction time. In the whole image reconstruction process, the texture map and the depth map of the corresponding group in the selected image combination at the user interaction time are combined and rendered only based on the virtual viewpoint position and the parameter data corresponding to the texture map and the depth map of the corresponding group in the image combination at the user interaction time, and the image reconstruction is not required to be performed based on the texture maps and the depth maps of all groups in the image combination at the user interaction time, so that the data operation amount in the image reconstruction process can be reduced.

In order to make the aforementioned objects, features and advantages of the embodiments of the present invention more comprehensible, specific embodiments of the present invention are described in detail below with reference to the accompanying drawings.

In the embodiment of the present invention, the video compression data or the image data may be acquired by the acquisition device. In order that those skilled in the art may better understand and implement the embodiments of the present invention, the following description is presented in terms of a specific application scenario.

An embodiment as an example of the present invention may include the steps of: the first step is acquisition and Depth map calculation, which includes three main steps, Multi-Camera Video Capturing, Camera inside and outside Parameter calculation (Camera Parameter Estimation), and Depth map calculation (Depth map calculation). For multi-camera acquisition, it is desirable that the video acquired by the various cameras be frame-level aligned. Referring to fig. 39 in combination, Texture images (Texture images), i.e., synchronized images, described later, may be obtained by video capture with multiple cameras; the Camera Parameter (Camera Parameter) can be obtained by calculating the internal Parameter and the external Parameter of the Camera, that is, the Parameter data in the following text, including the internal Parameter data and the external Parameter data in the following text; through the Depth Map calculation, a Depth Map (Depth Map) can be obtained.

In this scheme, no special camera, such as a light field camera, is required for video acquisition. Likewise, complicated camera calibration prior to acquisition is not required. Multiple cameras can be laid out and arranged to better capture objects or scenes to be photographed. Referring to fig. 40 in combination, a plurality of capturing devices, for example, cameras 1 to N, may be provided in the area to be viewed.

After the above three steps are processed, the texture map, all the camera parameters and the depth map of each camera acquired from the multiple cameras are obtained. These three portions of data may be referred to as data files in multi-angle free view video data, and may also be referred to as 6DoF video data. With the data, the user end can generate a virtual viewpoint according to a virtual 6Degree of Freedom (DoF) position, thereby providing a video experience of 6 DoF.

With reference to fig. 41, the 6DoF video data and the indicative data may be compressed and transmitted to the user side, and the user side may obtain the 6DoF expression of the user side according to the received data, that is, the 6DoF video data and the metadata, and then perform 6DoF rendering on the user side. Indicative data may also be referred to as Metadata (Metadata), among others.

Referring to fig. 42 in combination, the metadata may be used to describe a data schema of the 6DoF video data, and may specifically include: stitching Pattern metadata (Stitching Pattern metadata) indicating storage rules of pixel data and depth data of a plurality of images in a stitched image; edge protection metadata (Padding pattern metadata), which may be used to indicate the way edge protection is performed in the stitched image, and Other metadata (Other metadata). The metadata may be stored in a header file, and the specific order of storage may be as shown in FIG. 44, or in other orders.

Referring to fig. 43, the client obtains 6DoF video data, which includes camera parameters, texture map and depth map, and description metadata (i.e., "metadata" as mentioned above), and besides, interactive behavior data of the client. Through these data, the user side may perform 6DoF Rendering in a Depth Image-Based Rendering (DIBR) manner, so as to generate an Image of a virtual viewpoint at a specific 6DoF position generated according to a user behavior, that is, according to a user instruction, determine a virtual viewpoint at the 6DoF position corresponding to the instruction.

In one embodiment implemented at test time, each test case contained 20 seconds of video data, 30 frames/second, 1920 x 1080 resolution. For any of the 30 cameras, there are a total of 600 frames. The main folder contains a texture map folder and a depth map folder. Under the texture map folder, secondary directories from 0-599 can be found, which represent 600 frames of content for 20 seconds of video, respectively. Under each secondary directory, 30 camera-acquired texture maps were included, named from 0.yuv to 29.yuv in the format of yuv 420. Correspondingly, under the folder of the depth maps, each secondary directory contains 30 depth maps calculated by the depth estimation algorithm. Each depth map corresponds to a texture map by the same name. The texture maps and corresponding depth maps of multiple cameras are all of a certain frame instant in the 20 second video.

All the depth maps in the test case are generated by a preset depth estimation algorithm. In tests, these depth maps may provide good virtual viewpoint reconstruction quality over a virtual 6DoF position. In one case, the reconstructed image of the virtual viewpoint can be generated directly from the given depth map. Alternatively, the depth map may be generated or refined by a depth calculation algorithm from the original texture map.

The test case contains, in addition to the depth map and texture map, an sfm file, which is a parameter used to describe all 30 cameras. The data of this file is written in binary format, and the specific data format is described below. In consideration of adaptability to different video cameras, a fish-eye camera model with distortion parameters was used in the test. Reference can be made to the DIBR reference software we provide to see how to read and use camera parameter data from the document. The camera parameter data contains some of the following fields:

(1) krt _ R is the rotation matrix of the camera;

(2) krt _ cc is the optical center position of the camera;

(3) krt _ world position is the three-dimensional spatial coordinates of the camera;

(4) krt _ kc is the distortion coefficient of the camera;

(5) src _ width is the width of the calibration image;

(6) src _ height is the height of the calibration image;

(7) fishereye _ radius and lens _ fov are parameters of the fisheye camera.

In the technical solution of the present invention, the user can find out how to read the codes of the corresponding parameters in the sfm file in detail from the preset parameter reading function (set _ sfm _ parameters function).

In the video reconstruction system or DIBR software used in the embodiment of the present invention, the camera parameters, the texture map, the depth map, and the 6DoF position of the virtual camera may be received as inputs, and the generated texture map and the depth map at the virtual 6DoF position may be output at the same time. The 6DoF position of the virtual camera is the aforementioned 6DoF position determined according to the user behavior. The DIBR software may be software implementing virtual viewpoint-based image reconstruction in embodiments of the present invention.

Referring to fig. 44 in combination, in a DIBR software used in the embodiment of the present invention, camera parameters, a texture map, a depth map, and a 6DoF position of a virtual camera may be received as input, and a generated texture map and a generated depth map at the virtual 6DoF position may be output at the same time.

With reference to fig. 45 in combination, the image reconstruction method implemented in the embodiment of the present invention, or the software capable of implementing the image reconstruction method of the embodiment of the present invention, may include the following partial or whole processing steps: camera selection (camera selection), Forward mapping of Depth map (Depth map), Depth map post-processing (Postprocessing), Texture map Backward mapping (Backward mapping of Texture map), multi-camera mapping Texture map Fusion (Texture Fusion), and hole filling of image (Inpainting).

In the DIBR software described above, the two cameras closest to the virtual 6DoF position may be selected by default for virtual viewpoint generation.

In the step of post-processing of the depth map, the quality of the depth map may be improved by various methods, such as foreground edge protection, filtering at the pixel level, etc.

For the output generated image, a method of fusing texture maps captured from two cameras is used. The fusion weight is a global weight determined by the distance of the position of the virtual viewpoint from the reference camera position. In the case where a pixel of the output virtual visual point image is mapped by only one camera, that mapped pixel may be directly adopted as the value of the output pixel.

After the fusion step, if the pixels with holes are not mapped, the pixels with holes can be filled by adopting an image filling method.

For the output depth map, for convenience of error and analysis, a depth map mapped from one of the cameras to the virtual viewpoint position may be used as an output.

It should be understood that the foregoing examples are merely illustrative and not restrictive of specific embodiments, and the technical solutions in the examples of the present invention will be further described below.

Referring to the schematic diagram of the area to be watched shown in fig. 1, the area to be watched may be a basketball court, and a plurality of acquisition devices may be provided to acquire data of the area to be watched.

For example, with combined reference to FIG. 2, it may be at a height H above the basket_LKSeveral acquisition devices are arranged along a certain path, for example, 6 acquisition devices may be arranged along an arc, that is, an acquisition device CJ₁To CJ₆. It is understood that the arrangement position, number and supporting manner of the collecting device can be various, and is not limited herein.

The acquisition devices may be cameras or video cameras capable of synchronized shooting, for example, may be cameras or video cameras capable of synchronized shooting through a hardware synchronization line. Data acquisition is carried out on the region to be watched through a plurality of acquisition devices, and a plurality of synchronous images or video streams can be obtained. According to the video streams collected by the plurality of collecting devices, a plurality of synchronous frame images can be obtained as a plurality of synchronous images. It will be appreciated that synchronisation is ideally intended to correspond to the same time instant, but that errors and deviations may also be tolerated.

With reference to fig. 3 in combination, in the embodiment of the present invention, data acquisition may be performed on an area to be viewed by an acquisition system 31 including a plurality of acquisition devices; the acquired synchronized images may be processed by the acquisition system 31 or by the server 32 to generate multi-angle free view data that can support the display device 33 to perform virtual viewpoint switching. The displaying device 33 may display a reconstructed image generated based on the multi-angle free view data, the reconstructed image corresponding to a virtual viewpoint, display a reconstructed image corresponding to a different virtual viewpoint according to an instruction of a user, and switch a viewing position and a viewing angle.

In a specific implementation, the process of reconstructing a video or an image to obtain a reconstructed image may be implemented by the device 33 performing display, or may be implemented by a device located in a Content Delivery Network (CDN) in an edge computing manner. It is to be understood that fig. 3 is an example only and is not limiting of the acquisition system, the server, the device performing the display, and the specific implementation.

The process of video reconstruction based on multi-angle freeview data will be described in detail later with reference to fig. 38 and 39, and will not be described herein again.

With reference to fig. 4, following the previous example, the user may watch the to-be-watched area through the display device, in this embodiment, the to-be-watched area is a basketball court. As described above, the viewing position and the viewing angle are switchable.

For example, the user may slide on the screen surface to switch the virtual viewpoint. In an embodiment of the present invention, with reference to fig. 5, when the user's finger slides to the right along the screen surface, the virtual viewpoint for viewing can be switched. With continued reference to FIG. 2, the virtual viewpoint before sliding may be VP₁After the virtual viewpoint is switched by sliding along the surface of the screen, the virtual viewpoint can be VP₂. Referring collectively to fig. 6, after sliding along the screen surface, the reconstructed image of the screen presentation may be as shown in fig. 6. The reconstructed image can be obtained by image reconstruction based on multi-angle free view data generated by data acquired by a plurality of acquisition devices in an actual acquisition situation.

It is to be understood that the image viewed before switching may be a reconstructed image. The reconstructed image may be a frame image in a video stream. In addition, the manner of switching the virtual viewpoint according to the user instruction may be various, and is not limited herein.

In a specific implementation, the virtual viewpoint may be represented by a 6Degree of Freedom (DoF) coordinate, wherein the spatial position of the virtual viewpoint may be represented as (x, y, z) and the viewing angle may be represented as three rotational directions

The virtual viewpoint is a three-dimensional concept, and three-dimensional information is required for generating a reconstructed image. In a specific implementation, the multi-angle freeview data may include depth data for providing third-dimensional information outside the planar image (texture map). The amount of data for depth data is small compared to other implementations, such as providing three-dimensional information through point cloud data. A specific implementation of generating the multi-angle freeview data will be described in detail later with reference to fig. 19 to 37, and will not be described in detail here.

In the embodiment of the invention, the switching of the virtual viewpoint can be performed within a certain range, namely a multi-angle free visual angle range. That is, in the multi-angle free view range, the virtual view position and the view angle can be switched at will.

The multi-angle free visual angle range is related to the arrangement of the collecting equipment, and the wider the shooting coverage range of the collecting equipment is, the larger the multi-angle free visual angle range is. The quality of the picture displayed by the display equipment is related to the number of the acquisition equipment, and generally, the larger the number of the acquisition equipment is, the smaller the hollow area in the displayed picture is.

Referring to fig. 7, if two rows of upper and lower collecting devices with different heights are installed in the basketball court, the collecting devices CJ are respectively arranged in the upper row₁To CJ₆And lower row of collection devices CJ₁' to CJ₆' then, compared with the arrangement of only one row of collecting devices, the multi-angle free visual angle range is larger.

Referring to fig. 8 in combination, the user's finger can slide upward, switching the virtual viewpoint from which to view. Referring collectively to fig. 9, after sliding up the screen surface, the image presented by the screen may be as shown in fig. 9.

In specific implementation, if only one row of collecting devices is arranged, a certain degree of freedom in the up-down direction can be obtained in the process of obtaining a reconstructed image through image reconstruction, and the multi-angle free visual angle range of the multi-angle free visual angle range is smaller than the free visual angle range of the two rows of collecting devices in the up-down direction.

It can be understood by those skilled in the art that the foregoing embodiments and the corresponding drawings are only illustrative, and are not limited to the setting of the capturing device and the association relationship between the multi-angle free viewing angle ranges, nor to the operation manner and the obtained display effect of the device for displaying.

The following further elaborations are made in particular with regard to the method of setting up the acquisition device.

Fig. 10 is a flowchart illustrating a method for setting an acquisition device according to an embodiment of the present invention. In the embodiment of the present invention, the method specifically includes the following steps:

step S101, determining a multi-angle free visual angle range, and supporting the switching and watching of virtual viewpoints of an area to be watched in the multi-angle free visual angle range.

And S102, determining a setting position of acquisition equipment at least according to the multi-angle free visual angle range, wherein the setting position is suitable for setting the acquisition equipment and carrying out data acquisition on the area to be watched.

It will be understood by those skilled in the art that a fully free viewing angle may refer to a 6-degree-of-freedom viewing angle, i.e. a spatial position and a viewing angle at which a user can freely switch a virtual viewpoint at a device performing a display. Wherein the spatial position of the virtual viewpoint can be represented as (x, y, z), and the viewing angle can be represented as three rotational directions

There are 6 degrees of freedom directions in total, so it is called 6 degrees of freedom viewing angle.

As described above, in the embodiment of the present invention, the switching of the virtual viewpoint may be performed within a certain range, which is a multi-angle free view range. That is, within the multi-angle free view range, the virtual viewpoint position and the view can be arbitrarily switched.

The multi-angle free visual angle range can be determined according to the requirements of application scenes. For example, in some scenarios, the area to be viewed may have a core viewpoint, such as the center of a stage, or the center point of a basketball court, or the basket of a basketball court, etc. In these scenes, the multi-angle freeview range may include a plane or stereoscopic region that includes the core viewpoint. It is understood that the region to be viewed may be a point, a plane or a stereoscopic region, and is not limited thereto.

As mentioned above, the multi-angle free viewing angle range may be a variety of regions, which will be further exemplified below with reference to fig. 11 to 15.

Referring to FIG. 11, the core viewpoint is represented by O point, and the multi-angle free view range may be a sector area, such as sector area A, with the core viewpoint as the center of the circle and located on the same plane as the core viewpoint₁OA₂Or a sector area B₁OB₂Or a circular surface centered on the point O.

Takes the multi-angle free visual angle range as a sector area A₁OA₂For example, the position of the virtual viewpoint may be continuously switched within the area, e.g., from A₁Along the arc segment A₁A₂Continuously switching to A₂Alternatively, the arc line segment L may be followed₁L₂Switching is performed or otherwise the position is switched within the multi-angle free view range. Accordingly, the view angle of the virtual viewpoint may also be transformed within the region.

With further reference to fig. 12, the core viewpoint may be a central point E of the basketball court, and the multi-angle free view range may be a sector area, such as sector area F, with the central point E as a circle center and located on the same plane as the central point E₁₂₁EF₁₂₂. The center point E of the basketball court may be located on the court floor, or the center point E of the basketball court may be a certain height from the ground. Arc end point F of sector area₁₂₁And arc end point F₁₂₂May be the same, e.g. height H in the figure₁₂₁。

Referring to FIG. 13 in conjunction, where the core viewpoint is represented by point O, the multi-angle free view range may be a portion of a sphere centered at the core viewpoint, e.g., region C₁C₂C₃C₄Indicating a partial region of a sphere, the multi-angle free viewing angle range may be region C₁C₂C₃C₄The stereo range formed with point O. Any point within the range can be used as the position of the virtual viewpoint.

With further reference to FIG. 14, the core viewpoint may be a center point E of the basketball court, and the multi-angle viewing range may be a portion of a sphere centered at center point E, such as area F₁₃₁F₁₃₂F₁₃₃F₁₃₄Indicating a partial area of a sphere, the multi-angle free view range may be area F₁₃₁F₁₃₂F₁₃₃F₁₃₄A stereo range formed with the center point E.

In a scene with a core viewpoint, the position of the core viewpoint may be various, and the multi-angle free viewing angle range may also be various, which is not listed here. It is to be understood that the above embodiments are only examples and are not limiting on the multi-angle free view range, and the shapes shown therein are not limiting on actual scenes and applications.

In specific implementation, the core viewpoint may be determined according to a scene, in one shooting scene, there may also be multiple core viewpoints, and the multi-angle free view range may be a superposition of multiple sub-ranges.

In other application scenarios, the multi-angle free view range may also be coreless. For example, in some application scenarios, it is desirable to provide multi-angle free-view viewing of historic buildings, or to provide multi-angle free-view viewing of paintings and shows. Accordingly, the multi-angle free view range can be determined according to the needs of the scenes.

It is understood that the shape of the free view range may be arbitrary, and any point within the multi-angle free view range may be used as a position.

Referring to FIG. 15, the multi-angle free view range may be a cube D₁D₂D₃D₄D₅D₆D₇D₈The region to be viewed is a surface D₁D₂D₃D₄Then cube D₁D₂D₃D₄D₅D₆D₇D₈Any point in the virtual viewpoint can be used as the position of the virtual viewpoint, and the viewing angle of the virtual viewpoint, that is, the viewing angle, can be various. For example, may be on the surface D₅D₆D₇D₈Select position E6 along E₆D₁Can also be viewed along E₆D₉Is viewed from an angle ofD₉Selected from the area to be viewed.

In a specific implementation, after the multi-angle free view range is determined, the position of the acquisition equipment can be determined according to the multi-angle free view range.

Specifically, the setting position of the capturing device may be selected within the multi-angle free view range, for example, the setting position of the capturing device may be determined in a boundary point of the multi-angle free view range.

Referring to fig. 16, the core viewpoint may be a central point E of the basketball court, and the multi-angle free view range may be a sector area, such as sector area F, with the central point E as a center and located on the same plane as the central point E₆₁EF₆₂. The acquisition device may be arranged within a multi-angle view range, e.g. along arc F₆₅F₆₆Setting. The image reconstruction can be performed by using an algorithm in the area not covered by the acquisition equipment. In particular implementations, the acquisition device may also follow an arc F₆₁F₆₂And setting acquisition equipment at the end point of the arc line to improve the quality of the reconstructed image. Each collection device may be positioned toward a center point E of the basketball court. The position of the acquisition device may be represented by spatial position coordinates and the orientation may be represented by three rotational directions.

In specific implementation, the number of the settable setting positions can be 2 or more, and correspondingly, 2 or more acquisition devices can be set. The number of the acquisition devices can be determined according to the quality requirement of the reconstructed image or video. In scenes with higher picture quality requirements for the reconstructed image or video, the number of capture devices may be greater, while in scenes with lower picture quality requirements for the reconstructed image or video, the number of capture devices may be less.

With continued reference to fig. 16, it can be appreciated that the reduction of voids in the reconstructed picture, in the pursuit of higher reconstructed image or video picture quality, can be along an arc F₆₁F₆₂A greater number of acquisition devices are provided, for example, 40 cameras may be provided.

Referring to FIG. 17 in combination, the core viewpoint may be a basketballA center point E of the field, and the multi-angle view range may be a portion of a sphere centered at the center point E, such as the region F₆₁F₆₂F₆₃F₆₄Indicating a partial area of a sphere, the multi-angle free view range may be area F₆₁F₆₂F₆₃F₆₄A stereo range formed with the center point E. The acquisition device may be arranged within a multi-angle view range, e.g. along arc F₆₅F₆₆And arc F₆₇F₆₈And (4) setting. Similar to the previous example, the image reconstruction may be performed using an algorithm in the area not covered by the acquisition device. In particular implementations, the acquisition device may also follow an arc F₆₁F₆₂And arc line F₆₃F₆₄And setting acquisition equipment at the end point of the arc line so as to improve the quality of the reconstructed image. Each collection device may be positioned toward a center point E of the basketball court. It will be appreciated that, although not shown in the figures, the number of acquisition devices may be along the arc F₆₁F₆₂And arc line F₆₃F₆₄More.

As mentioned before, in some application scenarios, the region to be viewed may comprise a core viewpoint, and correspondingly, the multi-angle free view range comprises a region where views point to the core viewpoint. In this application scenario, the position of the capture device may be selected from an arc-shaped region with the direction of the depression pointing towards the core viewpoint.

When the region to be watched comprises the core watching point, the setting position is selected in the arc region pointing to the core watching point in the depression direction, so that the acquisition equipment is arranged in an arc shape. Because the watching area comprises the core watching point, and the visual angle points to the core watching point, in the scene, the arc-shaped arrangement of the acquisition equipment can adopt less acquisition equipment to cover a larger multi-angle free visual angle range.

In a specific implementation, the setting position of the acquisition device can be determined by combining the view angle range and the boundary shape of the region to be watched. For example, the setting positions of the capturing devices may be determined at preset intervals along the boundary of the region to be viewed within the viewing angle range.

Referring to FIG. 18 in combination, the multi-angle view range may be coreless, e.g., the virtual view location may be selected from the hexahedron F₈₁F₈₂F₈₃F₈₄F₈₅F₈₆F₈₇F₈₈And viewing the region to be viewed from the virtual viewpoint position. The boundary of the region to be viewed can be the ground boundary line of the court. The collecting equipment can be arranged along the intersection B of the ground boundary line and the region to be watched₈₉B₉₀Arranged, for example, in position B₈₉To position B ₉₄6 acquisition devices are provided. The degree of freedom in the up-down direction can be realized by an algorithm, or the horizontal projection position can be an intersecting line B₈₉B₉₀And then a row of collecting equipment is arranged.

In specific implementation, the multi-angle free viewing angle range can also support the viewing of the region to be viewed from the upper side of the region to be viewed, which is a direction away from the horizontal plane.

Correspondingly, can carry on collection equipment through unmanned aerial vehicle to set up collection equipment at the upside that waits to watch the region, also can set up collection equipment at the top of waiting to watch the building at region place, the top does the structure of building in the direction of keeping away from the horizontal plane.

For example, the collecting device can be arranged at the top of a basketball court or hovered on the upper side of the basketball court by carrying the collecting device by the unmanned aerial vehicle. Can set up collection equipment at the venue top at stage place, perhaps also can carry on through unmanned aerial vehicle.

By arranging the acquisition equipment at the upper side of the region to be watched, the multi-angle free visual angle range can comprise the visual angle above the region to be watched.

In a specific implementation, the capturing device may be a camera or a video camera, and the captured data may be picture or video data.

It is understood that the manner of disposing the collecting device at the disposing position may be various, for example, the collecting device may be supported at the disposing position by the supporting frame, or other disposing manners may be possible.

In addition, it is to be understood that the above embodiments are only for illustration and are not limiting on the setting mode of the acquisition device. In various application scenes, the specific implementation modes of determining the setting position of the acquisition equipment and setting the acquisition equipment for acquisition according to the multi-angle free visual angle range are all within the protection scope of the invention.

The following is further described with particular reference to a method of generating multi-angle freeview data.

As mentioned above, with continued reference to fig. 3, the acquired synchronized multiple two-dimensional images may be processed by the acquisition system 31 or by the server 32 to generate multi-angle free-view data capable of supporting virtual viewpoint switching by the displaying device 33, where the multi-angle free-view data may indicate third-dimensional information outside the two-dimensional images through depth data. The depth data may reflect the relative distance of the photographed object from the camera or camcorder. Based on the synchronized plurality of two-dimensional images and the corresponding depth data, multi-angle free view data that can support virtual viewpoint switching by the displaying device 33 can be generated.

In a specific implementation, referring to fig. 19 in combination, generating the multi-angle freeview data may include the steps of:

in step S191, a plurality of two-dimensional images synchronized with each other are acquired, and the plurality of two-dimensional images are captured at different angles.

Step S192 determines depth data of each two-dimensional image based on the plurality of two-dimensional images.

Step S193, for each of the two-dimensional images, storing pixel data of each two-dimensional image in a first field, and storing the depth data in at least one second field associated with the first field.

The synchronized plurality of two-dimensional images may be images captured by a camera or frame images in video data captured by a video camera. In generating the multi-angle freeview data, depth data of each two-dimensional image may be determined based on the plurality of two-dimensional images.

Wherein the depth data may comprise depth values corresponding to pixels of the two-dimensional image. The distances of the acquisition device to the various points in the area to be viewed may be used as the above-mentioned depth values, which may directly reflect the geometry of the visible surface in the area to be viewed. For example, the depth value may be the distance of each point in the area to be viewed along the camera optical axis to the optical center, and the origin of the camera coordinate system may be the optical center. It will be appreciated by those skilled in the art that the distance may be a relative value, as long as the same reference is used for multiple images.

Further, the depth data may include depth values corresponding one-to-one to the pixels of the two-dimensional image, or may be partial values selected from a set of depth values corresponding one-to-one to the pixels of the two-dimensional image.

As will be understood by those skilled in the art, the two-dimensional image is also referred to as a texture map, the depth value set may be stored in the form of a depth map, in a specific implementation, the depth data may be obtained by down sampling an original depth map, and the depth value set corresponding to pixels of the two-dimensional image (texture map) in a one-to-one manner is stored in the form of an image in which the pixels of the two-dimensional image (texture map) are arranged and stored.

In a specific implementation, the pixel data of the two-dimensional image stored in the first field may be original two-dimensional image data, such as data acquired from an acquisition device, or may be data obtained by reducing the resolution of the original two-dimensional image data. Further, the pixel data of the two-dimensional image may be the original pixel data of the image, or the pixel data after the resolution is reduced. The pixel data of the two-dimensional image may be YUV data or RGB data, or may be other data capable of expressing the two-dimensional image.

In a specific implementation, the depth data stored in the second field may be the same as or different from the number of pixels corresponding to the pixel data of the two-dimensional image stored in the first field. The number may be determined according to a bandwidth limit of data transmission performed by the device side that processes the multi-angle free-view image data, and if the bandwidth is small, the data amount may be reduced by the above-described down-sampling or resolution reduction.

In a specific implementation, for each of the two-dimensional images (texture maps), the pixel data of the two-dimensional image (texture map) may be sequentially stored in a plurality of fields according to a preset sequence, and the fields may be consecutive or may be spaced apart from the second field. A field storing pixel data of a two-dimensional image (texture map) may be used as the first field. The following examples are given.

For the sake of convenience of description, the images shown in fig. 20 to 25 and fig. 33 to 36 are two-dimensional images (texture maps) unless otherwise specified below.

Referring to fig. 20, pixel data of a two-dimensional image, illustrated as pixels 1 to 6 in the figure, and other pixels not shown, may be stored in a predetermined order into a plurality of consecutive fields, which may be used as a first field; the depth data corresponding to the two-dimensional image, which is indicated by depth values 1 to 6 in the image and other depth values not shown, may be stored in a plurality of consecutive fields in a predetermined order, and these consecutive fields may serve as the second field. The preset sequence may be sequentially stored line by line according to the distribution positions of the two-dimensional image pixels, or may be other sequences.

Referring to fig. 21, pixel data of a two-dimensional image and corresponding depth values may be alternately stored in a plurality of fields. A plurality of fields storing pixel data may be used as a first field, a plurality of fields storing depth values may be used as a second field.

In a specific implementation, the depth data is stored, which may be in the same order as the pixel data of the two-dimensional image is stored, such that each field in the first field may be associated with each field in the second field. And the depth value corresponding to each pixel can be embodied.

In particular implementations, pixel data for multiple two-dimensional images and corresponding depth data may be stored in a variety of ways. The following examples are further described below.

Referring collectively to FIG. 22, the individual pixels of texture map 1, illustrated as image 1 pixel 1, image 1 pixel 2, and other pixels not shown, may be stored in a continuous field, which may serve as the first field. The corresponding depth data of texture map 1, illustrated as image 1 depth value 1, image 1 depth value 2 shown in the figure, and other depth data not shown, may be stored in fields adjacent to the first field, which may serve as the second field. Similarly, for the pixel data of texture map 2, it may be stored in a first field, and the depth data corresponding to texture map 2 may be stored in an adjacent second field.

It is understood that each image in the image stream continuously captured by one capturing device of the synchronized capturing devices, or each frame image in the video stream, can be respectively used as the image 1. Similarly, in a synchronous multiple acquisition device, the two-dimensional image acquired synchronously with texture map 1 may be referred to as texture map 2. The acquisition device may be an acquisition device as in fig. 2, or an acquisition device in other scenarios.

Referring to fig. 23 in combination, the pixel data of texture map 1 and the pixel data of texture map 2 may be stored in a plurality of adjacent first fields, and the depth data corresponding to texture map 1 and the depth data corresponding to texture map 2 may be stored in a plurality of adjacent second fields.

Referring to fig. 24 in combination, the pixel data of each of the plurality of images may be stored in a plurality of fields, respectively, which may be referred to as a first field. The field storing the pixel data may be arranged to intersect with the field storing the depth value.

Referring to fig. 25 in conjunction, the pixel data and corresponding depth values of different texture maps may also be arranged in an intersecting manner, for example, image 1 pixel 1, image 1 depth value 1, image 2 pixel 1, and image 2 depth value 1 … may be stored in sequence until the pixel data and depth value corresponding to the first pixel of each image in a plurality of images are completed, and the adjacent fields thereof store image 1 pixel 2, image 1 depth value 2, image 2 pixel 2, and image 2 depth value 2 … until the storage of the pixel data and depth data of each image is completed.

In summary, the field storing the pixel data of each two-dimensional image may be used as the first field, and the field storing the depth data corresponding to the two-dimensional image may be used as the second field. The generated multi-angle self-organized view data may include a first field and a second field associated with the first field.

It will be appreciated by those skilled in the art that the various embodiments described above are merely examples and are not specific limitations on the types, sizes, and arrangements of fields.

Referring to fig. 3 in combination, the multi-angle freeview data including the first field and the second field may be stored in the server 32 at the cloud, and transmitted to the CDN or the display device 33 for image reconstruction.

In a specific implementation, the first field and the second field may both be pixel fields in a stitched image, and the stitched image is used to store pixel data of the plurality of images and the depth data. By adopting the image format for data storage, the data volume can be reduced, the data transmission duration can be reduced, and the resource occupation can be reduced.

The stitched image may be an image in a variety of formats, such as BMP format, JPEG format, PNG format, and the like. These image formats may be compressed formats or may be uncompressed formats. It will be appreciated by those skilled in the art that two-dimensional images of various formats may include fields, referred to as pixel fields, corresponding to individual pixels. The size of the stitched image, that is, parameters such as the number of pixels and the aspect ratio included in the stitched image, may be determined as needed, and specifically may be determined according to the number of the synchronized multiple two-dimensional images, the data amount to be stored in each two-dimensional image, the data amount of the depth data to be stored in each two-dimensional image, and other factors.

In a specific implementation, in the synchronized two-dimensional images, the depth data corresponding to the pixels of each two-dimensional image and the number of bits of the pixel data may be associated with the format of the stitched image.

For example, when the format of the stitched image is the BMP format, the range of the depth values may be 0-255, which is 8-bit data, and the data may be stored as the gray value in the stitched image; alternatively, the depth value may be 16-bit data, and the gray value may be stored in two pixel positions in the stitched image, or stored in two channels of one pixel position in the stitched image.

When the format of the stitched image is the PNG format, the depth value may also be 8bit or 16bit data, and in the PNG format, the 16bit depth value may be stored as a gray value of a pixel position in the stitched image.

It is understood that the above embodiments are not limited to the storage manner or the number of data bits, and other data storage manners that can be realized by those skilled in the art fall within the scope of the present invention.

In a specific implementation, the stitched image may be divided into a texture map region and a depth map region, a pixel field of the texture map region stores pixel data of the plurality of two-dimensional images, and a pixel field of the depth map region stores depth data of the plurality of images; the texture map region stores a pixel field of pixel data of each two-dimensional image as the first field, and the depth map region stores a pixel field of depth data of each image as the second field.

In a specific implementation, the texture map region may be a continuous region, and the depth map region may also be a continuous region.

Further, in a specific implementation, the splicing image may be divided equally, and the two divided parts are respectively used as the texture map region and the depth map region. Alternatively, the stitched image may be divided in a non-equally divided manner according to the pixel data amount and the depth data amount of the two-dimensional image to be stored

For example, referring to fig. 26, each minimum square indicates one pixel, the texture map region may be a region 1 within a dashed line frame, that is, an upper half region obtained by dividing the stitched image into upper and lower halves, and a lower half region of the stitched image may be used as a depth map region.

It is to be understood that fig. 26 is merely illustrative, and the minimum number of squares is not a limitation on the number of pixels in the mosaic image. In addition, the way of dividing equally may be to divide the stitched image equally left and right.

In a specific implementation, the texture map region may include a plurality of texture map sub-regions, each texture map sub-region for storing one of the plurality of images, and a pixel field of each texture map sub-region may be used as the first field; accordingly, the depth map region may include a plurality of depth map sub-regions, each of the depth map sub-regions being for storing depth data of one of the plurality of depth maps, and a pixel field of each of the depth map sub-regions may serve as the second field.

The number of texture map sub-regions and the number of depth map sub-regions may be equal, and are both equal to the number of the plurality of images that are synchronized. In other words, it may be equal to the number of cameras described above.

The splicing image is further described with reference to fig. 27 by taking the vertical bisector of the splicing image as an example. The upper half of the stitched image in fig. 27 is a texture map region, which is divided into 8 texture map sub-regions, and the texture map sub-regions store the pixel data of the synchronized 8 texture maps, respectively, and the shooting angles of each image are different, that is, the viewing angles are different. The lower half part of the spliced image is a depth map area which is divided into 8 depth map sub-areas, and the depth maps of the 8 images are respectively stored.

In combination with the foregoing, the pixel data of the 8 texture maps that are synchronized, that is, the texture map from the view angle 1 to the view angle 8, may be an original image acquired from the camera, or may also be an image of the original image with a reduced resolution. The depth data is stored in a partial region of the stitched image and may also be referred to as a depth map.

As described above, in an implementation, the stitched image may also be divided in a non-halving manner. For example, referring to fig. 28, the depth data may occupy a smaller number of pixels than the pixel data of the texture map, and the texture map region and the depth map region may be of different sizes. For example, the depth data may be obtained by quarter down-sampling the depth map, and then the division manner as shown in fig. 28 may be adopted. The number of pixels occupied by the depth map may also be greater than the detailed number occupied by the pixel data of the image.

It is to be understood that fig. 28 is not limited to dividing the stitched image in a non-uniform manner, and in a specific implementation, the amount of pixels and the aspect ratio of the stitched image may be various, and the dividing manner may also be various.

In a specific implementation, the texture map region or the depth map region may also include a plurality of regions. For example, as shown in fig. 29, the texture map region may be one continuous region, and the depth map region may include two continuous regions.

Alternatively, referring to fig. 30 and 31, the texture map region may include two continuous regions, and the depth map region may also include two continuous regions. The texture map region and the depth region may be arranged at intervals.

Still alternatively, referring to fig. 32, the texture map sub-regions included in the texture map region may be arranged at intervals from the depth map sub-regions included in the depth map region. The texture map region may include a number of contiguous regions equal to the number of texture map sub-regions, and the depth map region may include a number of contiguous regions equal to the number of depth map sub-regions.

In a specific implementation, for the pixel data of each texture map, the pixel data may be stored in the sub-region of the texture map according to the order of pixel arrangement. For the depth data corresponding to each texture map, the depth data can also be stored in the depth map sub-area according to the arrangement sequence of the pixel points.

With combined reference to fig. 33-35, texture map 1 is illustrated with 9 pixels in fig. 33, texture map 2 is illustrated with 9 pixels in fig. 34, and texture map 1 and texture map 2 are two-dimensional images from different angles in synchronization. According to the image 1 and the image 2, the depth data corresponding to the texture map 1, including the depth value 1 of the image 1 to the depth value 9 of the image 1, can be obtained, and the depth data corresponding to the texture map 2, including the depth value 1 of the image 2 to the depth value 9 of the image 2, can also be obtained.

Referring to fig. 35, when the texture map 1 is stored in the image sub-region, the texture map 1 may be stored in the texture map sub-region on the upper left according to the order of pixel point arrangement, that is, in the texture map sub-region, the arrangement of the pixel points may be the same as that of the texture map 1. The texture map 2 is stored to the image sub-area, also to the upper right texture map sub-area in this way.

Similarly, the depth data corresponding to the texture map 1 may be stored to the depth map sub-region in a similar manner, and in the case that the depth values correspond to the pixel values of the texture map one to one, may be stored as shown in fig. 35. If the depth value is obtained by downsampling the original depth map, the depth value can be stored in the sub-region of the depth map according to the sequence of pixel point arrangement of the depth map obtained by downsampling.

As will be understood by those skilled in the art, the compression rate for compressing an image is related to the association of each pixel in the image, and the stronger the association, the higher the compression rate. Because the image that obtains of shooing corresponds real world, the associativity of each pixel is stronger, through the order according to the pixel arranges, pixel data and the depth data of storage image, can be so that when compressing the concatenation image, the compression ratio is higher, also promptly, can make the data bulk after the compression littleer under the same condition of data bulk before the compression.

By dividing the spliced image into a texture map area and a depth map area, under the condition that a plurality of texture map sub-areas are adjacent in the texture map or a plurality of depth map sub-areas are adjacent in the depth map area, because data stored in each texture map sub-area is obtained from images or frame images in a video shot from different angles to a region to be watched, and depth maps are stored in the depth map area, a higher compression rate can be obtained when the spliced image is compressed.

In a specific implementation, all or part of the texture map sub-region and the depth map sub-region may be edge protected. The form of edge protection may be various, for example, taking the depth map of view 1 in fig. 31 as an example, redundant pixels may be disposed at the periphery of the depth map of original view 1; or the number of pixels of the original view 1 depth image is kept unchanged, redundant pixels which do not store actual pixel data are reserved on the periphery of the original view 1 depth image, and the original view 1 depth image is reduced and then stored in the rest pixels; or in other ways, finally leaving redundant pixels between the view 1 depth map and other images around it.

Because the spliced image comprises a plurality of texture maps and depth maps, the correlation of the adjacent boundaries of each texture map is poor, and the quality loss of the texture maps and the depth maps in the spliced image can be reduced when the spliced image is compressed by performing edge protection.

In a specific implementation, the pixel field of the texture map sub-region may store three channels of data and the pixel field of the depth map sub-region may store single channel data. The pixel field of the texture map sub-region is used to store pixel data of any one of the plurality of synchronized two-dimensional images, the pixel data typically being three-channel data, such as RGB data or YUV data.

The depth map sub-region is used for storing depth data of an image, if the depth value is 8-bit binary data, a single channel of the pixel field can be used for storing, and if the depth value is 16-bit binary data, a double channel of the pixel field can be used for storing. Alternatively, the depth values may be stored with a larger pixel area. For example, if the synchronized multiple images are all 1920 × 1080 images and the depth value is 16-bit binary data, the depth value may be stored in 2 times of the 1920 × 1080 image area, and each texture map area is stored as a single channel. The stitched image may also be divided in combination with the specific storage manner.

The uncompressed data volume of the mosaic image is stored in a manner that each channel of each pixel occupies 8 bits, and can be calculated according to the following formula: number of synchronized multiple two-dimensional images (data amount of pixel data of two-dimensional image + data amount of depth map).

If the original image has a resolution of 1080P, i.e. 1920 × 1080 pixels, and is scanned line by line, the original depth map may also occupy 1920 × 1080 pixels, which is a single channel. The pixel data amount of the original image is: 1920 × 1080 × 8 × 3bit, the data volume of the original depth map is 1920 × 1080 × 8bit, if the number of cameras is 30, the pixel data volume of the spliced image is 30 × 1080 (1920 × 1080 × 8 × 3+1920 × 1080) bit, which is about 237M, and if the image is not compressed, the system resources are occupied, and the delay is large. Especially, in the case of a small bandwidth, for example, when the bandwidth is 1Mbps, an uncompressed stitched image needs about 237s for transmission, the real-time performance is poor, and the user experience needs to be improved.

The data volume of the spliced image can be reduced by one or more of the ways of regularly storing to obtain higher compression ratio, reducing the resolution of the original image, taking the pixel data after the resolution reduction as the pixel data of the two-dimensional image, or carrying out down-sampling on one or more of the original depth maps.

For example, if the original two-dimensional image has a resolution of 4K, i.e., a pixel resolution of 4096 × 2160, and is down-sampled to a resolution of 540P, i.e., a pixel resolution of 960 × 540, the number of pixels of the stitched image is about one sixteenth of the number before down-sampling. The amount of data may be made smaller in combination with any one or more of the other ways of reducing the amount of data described above.

It can be understood that if the bandwidth is supported and the decoding capability of the device performing data processing can support a stitched image with a higher resolution, a stitched image with a higher resolution can also be generated to improve the image quality.

It will be appreciated by those skilled in the art that the pixel data of the synchronized multiple two-dimensional images and the corresponding depth data may also be stored in other ways, for example, in units of pixel points to the stitched image, in different application scenarios. Referring to fig. 33, 34, and 36, for the image 1 and the image 2 shown in fig. 33 and 34, it may be stored to the stitched image in the manner of fig. 36.

In summary, the pixel data of the two-dimensional image and the corresponding depth data may be stored in the stitched image, and the stitched image may be divided into the texture map region and the depth map region in various ways, or may not be divided, and the pixel data and the depth data of the texture map are stored in a preset order. In a specific implementation, the synchronized two-dimensional images may also be synchronized frame images obtained by decoding a plurality of videos. The video may be acquired by a plurality of video cameras, the settings of which may be the same as or similar to those of the cameras previously acquiring two-dimensional images.

In a specific implementation, the generating of the multi-angle freeview image data may further include generating an association relation field, and the association relation field may indicate an association relation of the first field with the at least one second field. The first field stores pixel data of one two-dimensional image in a plurality of synchronous two-dimensional images, and the second field stores depth data corresponding to the two-dimensional image, wherein the two depth data correspond to the same shooting angle, namely the same visual angle. The association relationship between the two can be described by the association relationship field.

Taking fig. 27 as an example, the area storing the texture map from view 1 to view 8 in fig. 27 is 8 first fields, the area storing the depth map from view 1 to view 8 is 8 second fields, and for the first field storing the texture map from view 1, there is an association relationship with the second field storing the depth map from view 1, and similarly, there is an association relationship between the field storing the texture map from view 2 and the field storing the depth map from view 2.

The association relation field may indicate an association relation between the first field and the second field of each of the synchronized multiple two-dimensional images in multiple ways, and specifically may be a content storage rule of the pixel data and the depth data of the synchronized multiple two-dimensional images, that is, by indicating the storage way described above, the association relation between the first field and the second field is indicated.

In a specific implementation, the association relationship field may only contain different mode numbers, and the device performing data processing may obtain the storage manner of the pixel data and the depth data in the acquired multi-angle free-viewing angle image data according to the mode number of the field and the data stored in the device performing data processing. For example, if the received pattern number is 1, the storage manner is analyzed as follows: the mosaic image is equally divided into an upper area and a lower area, the upper half area is a texture map area, the lower half area is a depth map area, and the texture map at a certain position of the upper half area is associated with the depth map stored at the corresponding position of the lower half area.

It can be understood that the storage modes for storing the stitched images in the foregoing embodiments, such as the storage modes illustrated in fig. 27 to fig. 36, may be described by corresponding association relationship fields, so that the device for processing data may obtain the associated two-dimensional image and depth data according to the association relationship fields.

As described above, the picture format of the stitched image may be any one of two-dimensional image formats such as BMP, PNG, JPEG, and Webp, or may be other image formats. The storage mode of the pixel data and the depth data in the multi-angle free visual angle image data is not limited to the mode of splicing images. The storage can be performed in various ways, and there can also be corresponding association field descriptions.

Similarly, the storage mode may be indicated by a mode number. For example, as shown in fig. 23, the association field may store a pattern number 2, and after the device performing data processing reads the pattern number, the device may analyze that the pixel data of the synchronized multiple two-dimensional images are sequentially stored, may analyze lengths of the first field and the second field, and after the storage of the multiple first fields is finished, store the depth data of each image in the same storage order as the two-dimensional images. The device for data processing may determine the association between the pixel data of the two-dimensional image and the depth data according to the association field.

It is understood that the storage manner of the pixel data and the depth data of the plurality of synchronized two-dimensional images may be various, and the expression manner of the association relation field may also be various. The mode number may be used for indication, or the content may be directly indicated. The device performing data processing may determine the association between the pixel data of the two-dimensional image and the depth data according to the content of the association field, in combination with the stored data or other a priori knowledge, for example, the content corresponding to each mode number, or the specific number of the synchronized multiple images.

In a specific implementation, the generating of the multi-angle freeview image data may further include: based on the synchronized plurality of two-dimensional images, parameter data of each two-dimensional image is calculated and stored, the parameter data including shooting position and shooting angle data of the two-dimensional image.

The device for processing data can determine a virtual viewpoint in the same coordinate system with the device according to the needs of the user by combining the shooting position and the shooting angle of each image in the plurality of synchronous two-dimensional images, reconstruct the images based on the multi-angle free visual angle image data, and show the expected viewing position and visual angle for the user.

In a specific implementation, the parameter data may also include internal parameter data including attribute data of a photographing apparatus of the image. The aforementioned shooting position and shooting angle data of the image may also be referred to as external parameter data, and the internal parameter data and the external parameter data may be referred to as attitude data. By combining the internal parameter data and the external parameter data, the factors indicated by the internal parameter data such as lens distortion and the like can be considered during image reconstruction, and the image of the virtual viewpoint can be reconstructed more accurately.

In a specific implementation, the generating of the multi-angle freeview image data may further include: generating a parameter data storage address field, wherein the parameter data storage address field is used for indicating a storage address of the parameter data. The apparatus performing data processing may acquire the parameter data from the storage address of the parameter data.

In a specific implementation, the generating of the multi-angle freeview image data may further include: and generating a data combination storage address field for indicating the storage address of the data combination, namely indicating the storage address of the first field and the second field of each image in the plurality of images which are synchronized. The apparatus that performs data processing may acquire pixel data of the synchronized plurality of two-dimensional images and corresponding depth data from a storage space corresponding to a storage address of a data combination, from which viewpoint the data combination includes the pixel data and the depth data of the synchronized plurality of two-dimensional images.

It is understood that the multi-angle freeview image data may include specific data such as pixel data of a two-dimensional image, corresponding depth data, and parameter data, as well as other indicative data, such as the aforementioned generation association field, parameter data storage address field, data combination storage address field, and the like. These indicative data may be stored in a header file to indicate to the device performing the data processing to obtain the data combination, as well as parameter data, etc.

In particular, the noun explanations, specific implementations, and advantageous effects referred to in the embodiments of generating multi-angle freeview data may be referred to other embodiments.

Referring to fig. 3, the generated multi-angle freeview image combination data may be transmitted to a device 33 for display via a network for virtual viewpoint switching. The displaying device 33 may present a reconstructed image generated through image reconstruction based on the multi-angle freeview data.

For a better understanding and realization of the embodiments of the present invention for those skilled in the art, the following further elaborations are made, in particular, on the image reconstruction method.

Referring to a flowchart of an image reconstruction method in an embodiment of the present invention shown in fig. 37, in a specific implementation, the following method may be adopted for image reconstruction.

S371, acquiring an image combination of a multi-angle free view, parameter data of the image combination and virtual viewpoint position information based on user interaction, wherein the image combination comprises a plurality of angle-synchronous groups of texture maps and depth maps with corresponding relations.

In particular implementations, as previously described, multiple angles of image capture may be performed for a scene by multiple cameras, video cameras, and the like.

The images in the multi-angle freeview image combination may be images of complete freeviews. In the implementation, the viewing angle can be 6 degrees of Freedom (DoF), that is, the spatial position of the viewpoint and the viewing angle can be freely switched. As previously mentioned, the spatial position of the viewpoint may be represented as coordinates (x, y, z),the viewing angle can be expressed as three rotational directionsAnd therefore may be referred to as 6 DoF.

In the process of image reconstruction, an image combination of multi-angle free visual angles and parameter data of the image combination can be acquired firstly.

In a specific implementation, as described above, multiple sets of texture maps and depth maps having a corresponding relationship in image combination may be stitched together to form a frame of stitched image, and specifically refer to the stitched image structures shown in fig. 27 to 32. Each frame of stitched images in the foregoing embodiments can be combined as one image. Referring to fig. 27, an image combination structure diagram in an embodiment of the present invention includes: and sequentially splicing the texture maps of the 8 synchronous different visual angles and the depth maps under the corresponding visual angles together.

Referring to fig. 27 to 32, multiple sets of texture maps and depth maps in the image combination may be spliced and combined according to a predetermined relationship. Specifically, the texture map and the depth map of the image combination may be divided into a texture map region and a depth map region according to a position relationship, the texture map region stores pixel values of each texture map, and the depth map region stores depth values corresponding to each texture map according to a preset position relationship. The texture map region and the depth map region may be continuous or spaced apart. In the embodiment of the invention, no limitation is made on the position relationship between the texture map and the depth map in the image combination.

For a specific relationship between multiple sets of texture maps and depth maps in an image combination, reference may be made to the description of the foregoing embodiments, and details are not described here.

In a specific implementation, the texture map and the depth map in the image combination are in one-to-one correspondence. The texture map may adopt any type of two-dimensional image format, for example, any one of BMP, PNG, JPEG, webp format, and the like. The depth map may represent the distance of points in the scene relative to the capture device, i.e. each pixel value in the depth map represents the distance between a point in the scene and the capture device.

As previously mentioned, to conserve transmission bandwidth and storage resources, the image combination may be transmitted or stored in a compressed format. For the obtained two-dimensional images in the compressed format, decoding can be performed first, and then multiple groups of synchronous two-dimensional images in corresponding image combinations can be obtained. In a specific implementation, the decompression device may be a decompression device capable of recognizing the compression format, a decompression hardware, or a combination of hardware and software.

In a specific implementation, the parameter data of the images in the combination can be obtained from the attribute information of the images.

As mentioned above, the parameter data may include external parameter data and may also include internal parameter data. The external parameter data is used for describing space coordinates, postures and the like of the shooting device, and the internal parameter data is used for expressing attribute information of the shooting device, such as an optical center, a focal length and the like of the shooting device. The internal parameter data may also include distortion parameter data. The distortion parameter data includes radial distortion parameter data and tangential distortion parameter data. Radial distortion occurs during the transformation of the coordinate system of the photographing apparatus to the physical coordinate system of the image. And the tangential distortion is generated in the manufacturing process of the shooting equipment because the plane of the photosensitive element is not parallel to the lens. Information such as a photographing position, a photographing angle of the image can be determined based on the external parameter data. The combination of the internal parameter data including the distortion parameter data during the image reconstruction process can make the determined spatial mapping relationship more accurate.

In a specific implementation, the image combination in the foregoing embodiments may be used as a data file in the embodiments of the present invention. In the application scene with limited bandwidth, the image combination can be divided into a plurality of parts for transmission for a plurality of times.

In a specific implementation, the virtual viewpoint location information based on user interaction may be expressed as coordinates (x, y, z,

) The virtual viewpoint position information may be presetOne or more user interaction modes. For example, the coordinates of the input may be manipulated by the user, such as a manual click or gesture path, or a virtual location determined by voice input, or the user may be provided with a self-defined virtual viewpoint (e.g., the user may input a location or perspective in the scene, such as under the basket, around the field, referee perspective, coach perspective, etc.). Or based on a particular object (e.g., a player on a court, an actor or guest in an image, a moderator, etc., that may switch to the perspective of the object after the user clicks on the corresponding object). It is to be understood that, in the embodiment of the present invention, a specific user interaction manner is not limited, as long as the virtual viewpoint position information based on the user interaction can be acquired.

And S372, selecting a texture map and a depth map of a corresponding group in the image combination at the user interaction time according to the virtual viewpoint position information and the parameter data of the image combination and a preset rule.

In a specific implementation, a texture map and a depth map of a corresponding group of image combinations at the time of user interaction, which satisfy a preset positional relationship and/or a number relationship with the virtual viewpoint position, may be selected according to the virtual viewpoint position information and the parameter data of the image combination. For example, for a virtual viewpoint position area with a high camera density, only texture maps and corresponding depth maps captured by 2 cameras closest to the virtual viewpoint may be selected, and for a virtual viewpoint position area with a low camera density, texture maps and corresponding depth maps captured by 3 or 4 cameras closest to the virtual viewpoint may be selected.

In an embodiment of the present invention, a texture map and a depth map corresponding to 2 to N acquisition devices closest to the virtual viewpoint position are selected according to the virtual viewpoint position information and parameter data corresponding to texture maps and depth maps of corresponding groups in an image combination at the moment of user interaction, where N is the number of all acquisition devices acquiring the image combination. For example, texture maps and depth maps corresponding to the 2 acquisition devices closest to the virtual viewpoint position may be selected by default. In specific implementation, the user may set the number of the selected acquisition devices closest to the virtual viewpoint position by himself, and the number of the acquisition devices corresponding to the combination of the images in the video frame is not more than the maximum.

By adopting the mode, no special requirement is required on the spatial position distribution of a plurality of shooting devices for acquiring images (for example, the shooting devices can be in linear distribution, arc array arrangement or any irregular arrangement form), the actual distribution condition of the shooting devices is determined according to the acquired virtual viewpoint position information and the parameter data corresponding to the image combination, and then the selection of the texture map and the depth map of the corresponding group in the image combination at the moment of user interaction is selected by adopting an adaptive strategy, so that higher selection freedom and flexibility can be provided under the conditions of reducing the data calculation amount and ensuring the quality of the reconstructed image, in addition, the installation requirement on the shooting devices for acquiring the images is also reduced, and the method is convenient to adapt to different site requirements and installation operability.

In an embodiment of the present invention, a preset number of texture maps and depth maps corresponding to the virtual viewpoint position in the image combination at the user interaction time are selected according to the virtual viewpoint position information and the parameter data of the image combination.

It will be appreciated that in particular embodiments, other preset rules may be used to select the corresponding set of texture map and depth map from the image combination, for example, the definition requirement (e.g. top-definition, high-definition, super-definition, etc.) of the reconstructed image may also be based on the processing capability of the image reconstruction device, or may be based on the requirement of the user on the reconstruction speed.

And S373, based on the virtual viewpoint position information and the parameter data corresponding to the texture map and the depth map of the corresponding group in the user interaction time image combination, performing combined rendering on the texture map and the depth map of the corresponding group in the selected user interaction time image combination to obtain a reconstructed image corresponding to the user interaction time virtual viewpoint position.

In specific implementation, a texture map and a depth map of a corresponding group in an image combination at a user interaction time can be combined and rendered in multiple ways to obtain a reconstructed image corresponding to a virtual viewpoint position at the user interaction time.

In an embodiment of the present invention, according to the depth map of the corresponding group in the image combination at the user interaction time, directly copying the pixel points in the texture map of the corresponding group to the generated virtual texture map, so as to obtain the reconstructed image corresponding to the virtual viewpoint position at the user interaction time.

In another embodiment of the invention, the image is reconstructed as follows: respectively carrying out forward mapping on the depth maps of corresponding groups in the selected user interaction time image combination, and mapping to the virtual position of the user interaction time; respectively carrying out post-processing on the depth maps after the forward mapping; respectively carrying out reverse mapping on texture maps of corresponding groups in the selected user interaction time image combination; and fusing the virtual texture maps generated after the reverse mapping.

In a specific implementation, the merged texture map may be output as a reconstructed image corresponding to the virtual viewpoint position at the time of user interaction.

In a specific implementation, the reconstructed image may include a corresponding depth map in addition to the texture map, and the corresponding depth map may be obtained in various ways. For example, the corresponding depth map may randomly select one of the depth maps obtained after the post-processing as the depth map of the reconstructed image. For another example, a depth map closest to the virtual viewpoint position at the time of user interaction may be selected from depth maps obtained after post-processing as a depth map of a reconstructed image, and if there is more than one depth map closest to the virtual viewpoint position, any one of the depth maps may be selected. For another example, the depth map after post-processing may be fused to obtain a reconstructed depth map.

In a specific implementation, after the virtual texture maps generated after the reverse mapping are fused, the fused texture maps can be subjected to void filling, so that a reconstructed image corresponding to the virtual viewpoint position at the user interaction time is obtained.

In a specific implementation, the depth maps after the forward mapping may be respectively post-processed by a plurality of methods. For example, the depth maps after forward mapping may be subjected to foreground edge protection processing, or the depth maps after forward mapping may be subjected to pixel level filtering processing. A certain post-processing action may be performed alone or multiple post-processing actions may be employed simultaneously.

In an embodiment of the present invention, the virtual texture maps generated after the reverse mapping are fused in the following manner: and according to the virtual viewpoint position information and the parameter data corresponding to the texture map and the depth map of the corresponding group in the image combination at the user interaction time, fusing all the virtual texture maps generated after reverse mapping by adopting the global weight determined by the distance between the position of the virtual viewpoint and the position of the acquisition equipment for acquiring the corresponding texture map in the image combination.

In another embodiment of the present invention, forward mapping may be performed first, and the texture maps of the corresponding set in the image combination are projected to a three-dimensional euclidean space by using depth information, that is: respectively mapping the depth maps of the corresponding groups to the virtual viewpoint positions at the user interaction time according to a space geometric relationship to form virtual viewpoint position depth maps, then executing reverse mapping, and projecting the three-dimensional space points to an imaging plane of a virtual camera, namely: and copying pixel points in the texture map of the corresponding group into the virtual texture map corresponding to the generated virtual viewpoint position according to the mapped depth map to form the virtual texture map corresponding to the corresponding group. And then, fusing the corresponding virtual texture maps of the corresponding groups to obtain a reconstructed image of the virtual viewpoint position at the user interaction moment. The method is adopted to reconstruct the image, and the sampling precision of the reconstructed image can be improved.

Preprocessing may be performed prior to performing the forward mapping. Specifically, the depth values of the forward map and the homography matrix of the texture reverse map may be calculated according to the corresponding set of parameters in the image combination. In a particular implementation, the depth level may be converted to a depth value using a Z-transform.

In the depth map forward mapping process, a depth map of a corresponding group may be mapped to a depth map of a virtual viewpoint position using a formula, and then a depth value of the corresponding position is copied. In addition, the depth map of the corresponding set may have noise, and some sampled signals may be included in the mapping process, so that the generated depth map of the virtual viewpoint position may have small noise holes. Median filtering can be used to remove noise for this problem.

In a specific implementation, the depth map of the virtual viewpoint position obtained after the forward mapping may be further post-processed according to requirements, so as to further improve the quality of the generated reconstructed image. In an embodiment of the present invention, before performing reverse mapping, a virtual viewpoint position depth map obtained by forward mapping is subjected to foreground and background occlusion relation processing, so that the generated depth map can more truly reflect the position relation of an object in a scene in the virtual viewpoint position.

For the reverse mapping, specifically, the positions of the corresponding texture maps in the virtual texture map may be calculated according to the depth map of the virtual viewpoint positions obtained by the forward mapping, and then the texture values of the corresponding pixel positions are copied, wherein the hole in the depth map may be marked as 0 or as having no texture value in the virtual texture map. Hole dilation can be done for the regions marked as holes to avoid synthesis artifacts.

And then, fusing the generated virtual texture maps of the corresponding groups to obtain a reconstructed image of the virtual viewpoint position at the moment of user interaction. In the specific implementation, can also be through a variety of ways to fuse, the following by two embodiments for example.

In an embodiment of the present invention, the weighting process is performed first, and then the hole filling is performed. Specifically, the method comprises the following steps: and weighting the pixels at the corresponding positions in the virtual texture maps corresponding to each corresponding group in the image combination at the user interaction time to obtain the pixel values at the corresponding positions in the reconstructed image at the virtual viewpoint position at the user interaction time. And then, for the position with the zero pixel value in the reconstructed image of the virtual viewpoint position at the user interaction moment, filling up the hole by using the pixels around the pixel in the reconstructed image to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.

In another embodiment of the present invention, the hole filling is performed first, and then the weighting process is performed. Specifically, the method comprises the following steps: and for the position where the pixel value in the virtual texture map corresponding to each corresponding group in the user interaction time image combination is zero, respectively filling the hole by using the surrounding pixel values, and then weighting the pixel value of the corresponding position in the virtual texture map corresponding to each corresponding group after filling the hole to obtain the reconstructed image of the virtual viewpoint position at the user interaction time.

The weighting processing in the above embodiment may specifically adopt a weighted average manner, and may also adopt different weighting coefficients according to camera parameters or a positional relationship between a camera and a virtual viewpoint. In an embodiment of the present invention, weighting is performed according to the position of the virtual viewpoint and the reciprocal of the distance of each camera position, that is: the closer the camera is to the virtual viewpoint position, the greater the weight.

In specific implementation, a preset hole filling algorithm may be adopted to fill the hole according to needs, which is not described herein again.

The above illustrates how to perform combined rendering on the texture map and the depth map of the corresponding group in the image combination based on the position of the virtual viewpoint and the parameter data of the corresponding group in the image combination at the time of user interaction. It is understood that, in a specific implementation, other Depth Image Based Rendering (DIBR) algorithms may also be adopted according to needs, and are not described in detail.

Referring to fig. 38, a schematic structural diagram of an image reconstruction system according to an embodiment of the present invention is shown, and an embodiment of the present invention further provides an image reconstruction system. As shown in fig. 38, the image reconstruction system 380 includes: an acquisition unit 381, a selection unit 382 and an image reconstruction unit 383, wherein:

an obtaining unit 381 adapted to obtain an image combination of multi-angle free time, which includes a plurality of angle-synchronized sets of texture maps and depth maps having a correspondence relationship, parameter data of the image combination, and virtual viewpoint position information based on user interaction;

a selecting unit 382, adapted to select, according to the virtual viewpoint position information and the parameter data of the image combination, a texture map and a depth map of a corresponding group in the image combination at a user interaction time according to a preset rule;

and an image reconstruction unit 383, adapted to perform combined rendering on the texture map and the depth map of the corresponding group in the selected user interaction time image combination based on the virtual viewpoint position information and parameter data corresponding to the texture map and the depth map of the corresponding group in the user interaction time image combination, so as to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction time.

By adopting the image reconstruction system, the texture map and the depth map of the corresponding group in the image combination at the user interaction time are selected according to the preset rule based on the virtual viewpoint position information and the acquired parameter data of the image combination, and the texture map and the depth map of the corresponding group in the image combination at the user interaction time are combined and rendered only based on the virtual viewpoint position and the parameter data corresponding to the texture map and the depth map of the corresponding group in the image combination at the user interaction time, rather than reconstructing the image based on the texture maps and the depth maps of all the groups in the image combination, so that the data operation amount in the image reconstruction process can be reduced.

In a specific implementation, the selecting unit 382 may select, according to the virtual viewpoint position information and the parameter data of the image combination, a texture map and a depth map of a corresponding group of the image combination at the user interaction time, where the texture map and the depth map satisfy a preset position relationship with the virtual viewpoint position, or a texture map and a depth map of a corresponding group of the image combination at the user interaction time, where the texture map and the depth map satisfy a preset number relationship with the virtual viewpoint position, or a texture map and a depth map of a corresponding group of the image combination at the user interaction time, where the texture map and the depth map satisfy the preset position relationship with the virtual viewpoint position.

In an embodiment of the present invention, the selecting unit 382 may select a preset number of texture maps and depth maps of a corresponding group closest to the virtual viewpoint position in the image combination at the time of user interaction according to the virtual viewpoint position information and the parameter data of the image set.

In a specific implementation, referring to fig. 38, the image reconstruction unit 383 may include: a forward mapping subunit 3831, a reverse mapping subunit 3832, a fusion subunit 3833, wherein:

the forward mapping subunit 3831 is adapted to map the depth maps of the corresponding groups to the virtual viewpoint positions at the user interaction time according to a spatial geometric relationship, respectively, to form virtual viewpoint position depth maps;

the inverse mapping subunit 3832 is adapted to copy, according to the mapped depth map, pixel points in the texture map of the corresponding group into the virtual texture map corresponding to the generated virtual viewpoint position, so as to form a virtual texture map corresponding to the corresponding group;

a fusion subunit 3833, adapted to fuse the corresponding sets of virtual texture maps to obtain a reconstructed image of the virtual viewpoint position at the user interaction time.

In an embodiment of the present invention, the fusion subunit 3833 is adapted to perform weighting processing on pixels at corresponding positions in the virtual texture map corresponding to the corresponding group in the user interaction time image combination, so as to obtain pixel values at corresponding positions in the reconstructed image at the virtual viewpoint position at the user interaction time; and for the position with the zero pixel value in the reconstructed image of the virtual viewpoint position at the user interaction time, filling up the hole by using the pixels around the pixel in the reconstructed image to obtain the reconstructed image of the virtual viewpoint position at the user interaction time.

In another embodiment of the present invention, the blending subunit 3833 is adapted to, for a position where a pixel value in the virtual texture image corresponding to each corresponding group in the image combination is zero at the moment of user interaction, respectively perform hole filling by using surrounding pixel values; and the method is suitable for weighting the pixel values of the corresponding positions in the virtual texture map corresponding to each corresponding group after the hole filling, so as to obtain the reconstructed image of the virtual viewpoint position at the user interaction time.

In another embodiment of the present invention, in the image reconstructing unit 383, the forward mapping subunit 3831 is adapted to forward map the depth maps of the corresponding groups in the selected user interaction time image combination, respectively, to the virtual position at the user interaction time; the inverse mapping subunit 3832 is adapted to perform inverse mapping on texture maps of corresponding groups in the selected user interaction time image combination, respectively; the fusion subunit 3833 is adapted to fuse the virtual texture maps generated after the inverse mapping.

In a specific implementation, the merged texture map may be output as a reconstructed image corresponding to the virtual viewpoint position at the time of user interaction.

In a specific implementation, the image reconstruction unit 383 may further comprise a post-processing sub-unit (not shown) adapted to post-process the depth maps after the forward mapping, respectively. For example, the post-processing sub-unit may perform at least one of foreground edge protection processing, pixel level filtering processing, and the like on the depth map after the forward mapping.

In a specific implementation, the obtaining unit 381 may include a decoding subunit (not shown), which is adapted to decode the obtained image compression data of the multi-angle freeview to obtain an image combination of the multi-angle freeview, where the image combination corresponds to the parameter data.

The image reconstruction device may include a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor may execute the steps of the image reconstruction method according to any of the embodiments when executing the computer instructions.

In a specific implementation, the image reconstruction device may include a terminal device, and after the image reconstruction is completed by the terminal device according to the above embodiment, the image can be output and displayed through a display interface for a user to watch. The terminal equipment can be a handheld terminal such as a mobile phone, a tablet personal computer, a set top box and the like.

In specific implementation, the image reconstruction may be performed by using edge nodes, and after the image reconstruction is completed, the edge nodes may be output to a key device in communication with the edge nodes for being viewed by a user. The edge computing node may be a node that performs close range communication with a display device that displays the reconstructed image, maintaining a high bandwidth low latency connection, such as through a WiFi, 5G network, or the like. In a specific implementation, the edge node may be any one of a base station, a router, a home gateway, a vehicle-mounted device, and the like. Referring collectively to fig. 3, the edge node may be a device located at the CDN.

In a specific implementation, in a network, a specific terminal device or an edge node device may be selected according to processing capabilities of the terminal device and the edge node, or according to user selection, or according to operator configuration, to perform the image reconstruction process in the embodiment of the present invention, which may specifically refer to the specific method described in the embodiment of the present invention, and is not described again.

The embodiment of the present invention further provides a computer-readable storage medium, on which computer instructions are stored, and when the computer instructions are executed, the method may perform the steps of the image reconstruction method according to any one of the above embodiments of the present invention. The computer readable storage medium may be various suitable readable storage media such as an optical disc, a mechanical hard disc, a solid state hard disc, and the like. The image reconstruction method executed by the instruction stored in the computer-readable storage medium may specifically refer to the foregoing embodiments of the image reconstruction method, and is not described in detail again.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

53页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种立体成像装置及其成像方法

Image reconstruction method, system, device and computer readable storage medium

相关技术

网友询问留言