Generating representations of objects from depth information determined in parallel from images captured by multiple cameras

文档序号：653508 发布日期：2021-04-23 浏览：4次中文

阅读说明：本技术 从由多台照相机捕获的图像并行确定的深度信息生成对象的表示 (Generating representations of objects from depth information determined in parallel from images captured by multiple cameras ) 是由吴城磊余守壹于 2019-09-12 设计创作，主要内容包括：具有不同方位的多台照相机捕获相对于照相机位于目标位置的对象的图像。来自每个照相机的图像被并行处理,以根据由每个图像捕获设备并行捕获的图像内的不同区域之间的对应关系来确定深度信息。基于来自每个照相机的图像的阴影信息和来自图像的立体信息,并行修改图像的深度信息。在各种实施例中,通过最小化来自图像的具有公共深度的部分的强度和来自由多个照相机捕获的图像的阴影信息的图像的部分的强度的总能量来细化深度信息。来自多个图像的经修改的深度信息被组合以生成位于目标位置的对象的重建。(A plurality of cameras having different orientations capture images of an object located at a target position relative to the cameras. The images from each camera are processed in parallel to determine depth information from the correspondence between different regions within the images captured in parallel by each image capture device. Depth information of the images is modified in parallel based on the shading information of the images from each camera and the stereo information from the images. In various embodiments, depth information is refined by minimizing the total energy of the intensity of portions of the image having a common depth and the intensity of portions of the image from shadow information of images captured by multiple cameras. The modified depth information from the multiple images is combined to generate a reconstruction of an object located at the target location.)

1. A system, comprising:

a plurality of cameras, each camera having a particular position relative to each other and configured to capture an image of an object located at a target position; and

a console coupled to each camera of the plurality of cameras, the console configured to:

receiving one or more images of the object from each of the plurality of cameras,

determining depth information for images received from each of the plurality of cameras in parallel;

modifying in parallel the depth information determined for images received from each of the plurality of cameras at a common time based on the intensity of the portion of each image having the common depth and the intensity of the portion of each image determined from the shading information from each image; and

generating a reconstruction of the object by combining the modified depth information of the images received from each of the plurality of cameras.

2. The system of claim 1, wherein modifying in parallel the depth information determined for images received from each of the plurality of cameras at a common time based on the intensity of the portion of each image having the common depth and the intensity of the portion of each image determined from the shading information from each image comprises:

identifying images received from a plurality of cameras at the common time;

determining a global strength of the identified portions of the image having common depth information;

determining intensities of portions of the identified image having different depths based on shadow information of the identified image;

generating an energy of the identified image by combining a global intensity of portions of the identified image having various common depths with an intensity based on shadow information determined for the portions of the identified image; and

the depth of the identified portion of the image is modified to minimize the energy of the identified image.

3. The system of claim 2, wherein generating the energy of the identified image by combining the global intensity of the identified portions of the image having various common depths with the intensity based on the shadow information determined for the identified portions of the image further comprises:

the regularization values are combined with depth estimates for portions of the identified images having various common depths and depth estimates for one or more corresponding adjacent portions of the identified images.

4. A system according to claim 1, claim 2 or claim 3, wherein the console comprises a plurality of processors, each processor being configured to receive images from a camera at the common time, determine depth information for the images received from the cameras, and modify in parallel the depth information determined for the images received from the cameras at the common time based on the intensity of the portions of each image having the common depth and the intensity of the portions of the image determined from the shadow information from each image;

and/or wherein each processor may optionally include a graphics processing unit;

and/or wherein the console may optionally be further configured to store the generated reconstruction of the object.

5. A method, comprising:

capturing images of an object located at a target location relative to a plurality of cameras, each camera capturing at least one image of the object;

determining depth information for images captured from each of the plurality of cameras in parallel;

generating a reconstruction of the object by combining the modified depth information of the images received from each of the plurality of cameras.

6. The method of claim 5, wherein modifying in parallel the depth information determined for the images received from each of the plurality of cameras at the common time based on the intensity of the portion of each image having the common depth and the intensity of the portion of each image determined from the shading information from each image comprises:

identifying images received from a plurality of cameras at the common time;

determining a global intensity of portions of the identified images having a common depth;

determining intensities of portions of the identified image having different depths based on shadow information of the identified image;

the depth of the identified portion of the image is modified to minimize the energy of the identified image.

7. The method of claim 6, wherein generating the energy of the identified image by combining the global intensity of the identified portions of the image having various common depths with the intensity based on the shadow information determined for the identified portions of the image further comprises:

8. The method of claim 5, claim 6, or claim 7, wherein determining depth information for images captured by each of the plurality of cameras in parallel comprises:

depth information of images captured by different cameras at the common time is determined in parallel, wherein different processors determine depth information of images captured by different cameras at the common time.

9. The method of any of claims 5 to 8, wherein modifying in parallel the depth information determined for the images received from each of the plurality of cameras at the common time based on the intensity of the portion of each image having the common depth and the intensity of the portion of each image determined from the shadow information from each image comprises:

modifying, in parallel, depth information determined for images captured by different cameras at the common time, wherein different processors modify the determined depth information determined for images captured by different cameras at the common time.

10. The method of any one of claims 5 to 9, further comprising one or more selected from the group consisting of:

storing the generated reconstruction of the object;

presenting the generated reconstruction of the object via a display device;

transmitting the generated reconstruction of the object to a client device.

11. A computer program product comprising a computer readable storage medium having instructions encoded thereon, which when executed by a processor, cause the processor to:

obtaining images of an object located at target locations relative to a plurality of cameras, each camera capturing at least one image of the object;

determining depth information for images captured from each of the plurality of cameras in parallel;

generating a reconstruction of the object by combining the modified depth information of the images received from each of the plurality of cameras.

12. The computer program product of claim 11, wherein modifying in parallel the depth information determined for images received from each of the plurality of cameras at a common time based on the intensity of the portion of each image having the common depth and the intensity of the portion of each image determined from the shading information from each image comprises:

identifying images received from a plurality of cameras at the common time;

determining a global strength of the identified portions of the image having common depth information;

determining intensities of portions of the identified image having different depths based on shadow information of the identified image;

modifying a depth of the identified portion of the image to minimize an energy of the identified image; and

wherein generating the energy of the identified image by combining the global intensity of the portions of the identified image having various common depths with the intensity based on the shadow information determined for the portions of the identified image may optionally comprise:

the regularization values, the global strength of the identified portions of the image having various common depths, and the strength determined for the identified portions of the image having corresponding common depths are combined.

13. The computer program product of claim 11 or claim 12, wherein determining depth information for images captured by each of the plurality of cameras in parallel comprises:

14. The computer program product of any of claims 11 to 13, wherein modifying in parallel the depth information determined for the images received from each of the plurality of cameras at the common time based on the intensity of the portions of each image having the common depth and the intensity of the portions of each image determined from the shading information from each image comprises:

15. The computer program product of any of claims 11 to 14, wherein the computer-readable storage medium further has instructions encoded thereon that, when executed by the processor, cause the processor to:

storing the generated reconstruction of the object.

Background

The present disclosure relates generally to configurations including multiple cameras directed at a target location, and more particularly to generating a three-dimensional reconstruction of an object located at the target location from images captured by the multiple cameras.

The increase in camera resolution has increased the popularity of three-dimensional vision systems. In a three-dimensional vision system, images of an object are captured by one or more cameras. The captured images are provided to a computing device, which analyzes the images to generate a three-dimensional graphical reconstruction of the object.

However, combining images of different views of an object to generate a three-dimensional graphical reconstruction of the object is computationally intensive. This use of computing resources increases the time to generate a graphical reconstruction of the object. The increased time to generate the graphical reconstruction of the object limits the potential use of the three-dimensional vision system to implementations that can tolerate the delay from image capture to generating the graphical reconstruction of the object.

SUMMARY

To capture image or video data of an object to be rendered in a Virtual Reality (VR) environment or an Augmented Reality (AR) environment, a plurality of cameras are positioned to focus on a target location at which the object is located. The cameras have a specific positioning relative to each other and relative to the target position. This allows each camera to capture a different image of the object at the target location. Each camera is coupled to a console that receives images of objects located at the target location from each camera. The console processes the received images to generate a graphical representation of the object at the target location.

To generate a graphical representation of the object at the target location, the console processes the images from each camera in parallel to determine depth information. For example, different processors apply a patch-match process on a Graphics Processing Unit (GPU) to images captured by different cameras at a common time. To improve convergence and additionally reduce computation time, a coarse-to-fine tile matching process is applied to images captured by different cameras at a common time; in various embodiments, the tile matching process is first applied to coarse resolution images captured by different cameras at a common time and the fine resolution images captured at the common time are initialized to run the tile matching process with fewer iterations. To improve the accuracy of the determined depth information, the console modifies the determined depth information for the images received from each camera. In various embodiments, the console modifies the depth information determined for the various images by optimizing an energy function based on the intensity from the stereo information and the intensity from the shadow information of the various images. For example, for images received from multiple cameras (e.g., from each camera) at a common time, the console determines a global intensity of a portion of the images corresponding to the common depth based on the depth information determined for the different images. For example, the console determines the global intensity of portions of images received from different cameras as the average intensity of portions of images received from different cameras at a common time that have common depth information or depth information within a threshold amount of each other. Further, the console determines the intensity of different portions of the image received from the camera at a common time based on shadow information calculated from the image. The console generating a total energy for the images received from the cameras at the common time by combining the global intensity of the different portions of the images received from the different cameras at the common time with the intensity determined from the shading information of the different portions of the images received from the cameras at the common time; in various embodiments, the console sums the global intensity of different portions of the images received from different cameras at a common time with the intensity determined from the shading information of the different portions of the images received from the cameras at the common time. In various embodiments, the console sums the intensity differences between the global intensity of different portions of images received from different cameras at a common time and the intensity calculated from the shading information for the respective images. The console may also combine a regularization value (regularization value) with depth estimates of portions of the image and depth estimates of corresponding neighboring portions of the image received from the camera; the regularization values take into account the similarity between depth estimates of mutually adjacent portions of the image.

For each image received from a different camera at a common time, the console modifies the depth information determined for the image by minimizing the total energy of each image received at the common time. The console uses any suitable process or processes to minimize in parallel the total energy of each image received at a common time. For example, the console applies a gauss-newton method to images received from each camera at a common time using a Graphics Processing Unit (GPU) to minimize the total energy of each image received at the common time, which modifies the depth information determined for images received from multiple cameras. The console combines the modified depth information from the multiple images to generate a reconstruction of the object located at the target location.

According to a first aspect, there is provided a system comprising: a plurality of cameras, each camera having a particular position relative to each other and configured to capture an image of an object located at a target position; and a console coupled to each of the plurality of cameras, the console configured to: receiving one or more images of the object from each of the plurality of cameras, determining depth information of the images received from each of the plurality of cameras in parallel; modifying in parallel the depth information determined for images received from each of the plurality of cameras at a common time based on the intensity of the portion of each image having the common depth and the intensity of the portion of each image determined from the shading information from each image; and generating a reconstruction of the object by combining the modified depth information of the images received from each of the plurality of cameras.

In some embodiments, "modifying in parallel the depth information determined for images received from each of the plurality of cameras at a common time based on the intensity of the portion of each image having the common depth and the intensity of the portion of each image determined from the shadow information from each image" may include: identifying images received from a plurality of cameras at a common time; determining a global strength of the identified portions of the image having common depth information; determining intensities of portions of the identified image having different depths based on shadow information of the identified image; generating an energy of the identified image by combining a global intensity of the identified portions of the image having the various common depths with an intensity based on the shadow information determined for the identified portions of the image; and modifying the depth of the identified portion of the image to minimize the energy of the identified image.

In some embodiments, "generating the energy of the identified image by combining the global intensity of the identified portions of the image having various common depths with the intensity based on the shadow information determined for the identified portions of the image" may further include: the regularization values are combined with depth estimates for portions of the identified images having various common depths and depth estimates for one or more corresponding adjacent portions of the identified images.

In some embodiments, the console may include a plurality of processors, each processor configured to receive images from the cameras at a common time, determine depth information for the images received from the cameras, and modify in parallel the depth information determined for the images received from the cameras at the common time based on the intensity of the portions of each image having the common depth and the intensity of the portions of the image determined from the shading information for each image.

Each processor may include a graphics processing unit.

The console may also be configured to store the generated reconstruction of the object.

According to a second aspect, there is provided a method comprising: capturing images of an object located at a target location relative to a plurality of cameras, each camera capturing at least one image of the object; determining depth information for images captured from each of a plurality of cameras in parallel; modifying in parallel the depth information determined for images received from each of the plurality of cameras at a common time based on the intensity of the portion of each image having the common depth and the intensity of the portion of each image determined from the shading information from each image; and generating a reconstruction of the object by combining the modified depth information of the images received from each of the plurality of cameras.

In some embodiments, "modifying in parallel the depth information determined for images received from each of the plurality of cameras at a common time based on the intensity of the portion of each image having the common depth and the intensity of the portion of each image determined from the shading information from each image" may include: identifying images received from a plurality of cameras at a common time; determining a global intensity of portions of the identified images having a common depth; determining intensities of portions of the identified image having different depths based on shadow information of the identified image; generating an energy of the identified image by combining a global intensity of portions of the identified image having various common depths with an intensity based on shadow information determined for the portions of the identified image; and modifying the depth of the identified portion of the image to minimize the energy of the identified image.

In some embodiments, "generating the energy of the identified image by combining the global intensity of the portions of the identified image having the various common depths with the intensity based on the shadow information determined for the portions of the identified image" may further comprise: the regularization values are combined with depth estimates for portions of the identified image having various common depths with depth estimates for one or more corresponding adjacent portions of the identified image.

In some embodiments, "determining depth information for images captured by each of a plurality of cameras in parallel" may include: depth information for images captured by different cameras at a common time is determined in parallel, wherein different processors determine depth information for images captured by different cameras at a common time.

In some embodiments, the present invention may further comprise: the generated reconstruction of the object is stored.

The present invention may further comprise: the generated reconstruction of the object is rendered by a display device.

The present invention may further comprise: the generated reconstruction of the object is transmitted to the client device.

It will be appreciated that in the context of the present invention any feature of the system of the first aspect may be compatible with and include a feature of the method of the second aspect, and vice versa, mutatis mutandis.

According to a third aspect, there is provided a computer program product comprising a computer readable storage medium having instructions encoded thereon, which when executed by a processor, cause the processor to: obtaining images of an object located at target locations relative to a plurality of cameras, each camera capturing at least one image of the object; determining depth information for images captured from each of a plurality of cameras in parallel; modifying in parallel the depth information determined for images received from each of the plurality of cameras at a common time based on the intensity of the portion of each image having the common depth and the intensity of the portion of each image determined from the shading information from each image; and generating a reconstruction of the object by combining the modified depth information of the images received from each of the plurality of cameras.

In some embodiments of the computer program product, "modifying depth information determined for images received from each of the plurality of cameras at a common time in parallel based on the intensity of the portion of each image having the common depth and the intensity of the portion of each image determined from the shadow information from each image" may include: identifying images received from a plurality of cameras at a common time; determining a global strength of the identified portions of the image having common depth information; determining intensities of portions of the identified image having different depths based on shadow information of the identified image; generating an energy of the identified image by combining a global intensity of portions of the identified image having various common depths with an intensity based on shadow information determined for the portions of the identified image; and modifying the depth of the identified portion of the image to minimize the energy of the identified image.

In some embodiments of the computer program product, "generating the energy of the identified image by combining the global intensity of the identified portions of the image having the various common depths with the intensity based on the shadow information determined for the identified portions of the image" may include: the regularization values, the global strength of the identified portions of the image having various common depths, and the strength determined for the identified portions of the image having corresponding common depths are combined.

In some embodiments of the computer program product, "determining depth information for images captured by each of the plurality of cameras in parallel" may include: depth information for images captured by different cameras at a common time is determined in parallel, wherein different processors determine depth information for images captured by different cameras at a common time.

The computer-readable storage medium of the computer program product may also have instructions encoded thereon that, when executed by the processor, cause the processor to: the generated reconstruction of the object is stored.

It will be appreciated that in the context of the present invention any feature of the computer program product of the third aspect may be compatible with and include features of the method of the second aspect and/or the system of the first aspect, and this is true mutatis mutandis.

Brief Description of Drawings

FIG. 1 is a block diagram of a system environment including a plurality of cameras configured to capture images of an object at a target location, according to one embodiment.

FIG. 2 is a flow diagram of a method for generating a representation of an object from images of the object captured by a plurality of cameras positioned differently relative to one another, according to one embodiment.

FIG. 3 is a process flow diagram for generating a representation of an object from images of the object captured by a plurality of cameras positioned differently relative to one another, according to one embodiment.

The figures depict embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles or advantages of the disclosure described herein.

Detailed Description

Overview of the System

FIG. 1 is a block diagram of a system environment 100, the system environment 100 including a plurality of cameras 110A-F configured to capture images of an object located at a target location 120. Each camera 110A-F is coupled to a console 130. In other embodiments, any suitable number of cameras 110A-F are included in system environment 130. Moreover, in other embodiments, different and/or additional components may be included in system environment 100.

Each camera 110A-F captures an image of an object located at the target location 120. Thus, each camera 110A-F is configured to have a focus of target location 120 or a focus within target location 120. In addition, each camera 110A-F has a particular position relative to each other camera 110A-F in the system environment 130. For example, camera 110A has a particular position relative to camera 110B and a particular position relative to camera 110D. The different positions of the cameras 110A-F relative to each other cause each camera 110A-F to capture images of a different portion of the object located at the target location 120. In various embodiments, the cameras 110A-F are positioned relative to each other such that each camera 110A-F still has an overlapping field of view, including different portions of an object located at the target location 120.

Each camera 110A-F captures images based on light having a different wavelength reflected by an object located at the target location 120. In some embodiments, each camera 110A-F captures images of light within a common wavelength range reflected by objects located at target location 120. For example, each camera 110A-F captures infrared light reflected by an object located at target location 120. As another example, each camera 110A-F captures visible light reflected by an object located at the target location. Alternatively, different cameras 110A-F capture light having different wavelength ranges reflected by objects located at target location 120. For example, the cameras 110A, 110C, 110E capture infrared light reflected by an object located at the target location 120, while the cameras 110B, 110D, 110F capture visible light reflected by an object located at the target location 120. Each camera 110A-F has parameters such as focal length, focus, frame rate, ISO, sensor temperature, shutter speed, aperture, resolution, etc.; one or more parameters of cameras 110A-F may be modified. In some embodiments, cameras 110A-F have a high frame rate and high resolution. In various embodiments, cameras 110A-F capture two-dimensional images of objects located at target location 120. In some embodiments, cameras 110A-F capture two-dimensional images of objects located at target location 120.

In some embodiments, one or more illumination sources are positioned relative to cameras 110A-F and target location 120. One or more illumination sources are positioned to illuminate the target location 120, which allows illumination of an object located at the target location. The illumination sources may be located at discrete positions relative to one or more of the cameras 110A-F. Alternatively, the illumination source is coupled to one or more of the cameras 110A-F. Example illumination sources include Light Emitting Diodes (LEDs) that emit light in the visible band (i.e., -380 nm to 750nm), the infrared band (i.e., -750 nm to 1mm), the ultraviolet band (i.e., 10nm to 380nm), some other portion of the electromagnetic spectrum, or some combination thereof. In some embodiments, different illumination sources have different characteristics. For example, different illumination sources emit light with different wavelengths or different temporal coherence, which describes the correlation between light waves at different points in time. Furthermore, the light emitted by different illumination sources may be modulated at different frequencies or amplitudes (i.e., varying intensities), or multiplexed in the time or frequency domain.

One or more illumination sources are coupled to the console 130, and the console 130 provides control signals to the one or more illumination sources. For example, the console 130 provides control signals to the illumination source that modify the intensity of the light emitted by the illumination source. As another example, the console 130 provides control signals to the illumination source that modify the direction in which the illumination source emits light or modify the focus of the light emitted by the illumination source.

The console 130 is a computing device coupled to each camera 110A-F and is configured to receive images captured by one or more of the cameras 110A-F. In addition, the console 130 is configured to send control signals to one or more cameras 110A-F that modify one or more parameters of the cameras. For example, a control signal provided to the camera 110A from the console 130 modifies the focus of the camera 110A or modifies the zoom of the camera 110A.

Further, the console 130 receives images captured by the plurality of cameras 110A-F and generates a reconstruction of an object located at the target location 120 and included in the images received from the plurality of cameras 110A-F. When generating a reconstruction of the object 120, the console 130 processes images received from the multiple cameras 110A-F in parallel. As described further below in conjunction with FIGS. 2 and 3, the console 130 determines depth information from the various images in parallel based on the correspondence between regions in the images captured by the different cameras 110A-F, and determines shading information in the images captured by the different cameras 110A-F in parallel. Using the shadow information, the console 130 refines the depth information determined from the correspondence in parallel. When refining the depth information determined from the images, the console 130 optimizes the total energy of the images based on the intensity determined from the shadow information of the portions of the images captured by the plurality of cameras 110A-F having a common depth and the intensity of the portions of the images captured by the plurality of cameras 110A-F determined from the shadow information. In various embodiments, the console 130 refines the depth information of the image by minimizing the total energy of the image based on the intensities of the portions of the image captured by the plurality of cameras 110A-F having a common depth and the intensities of the portions of the image captured by the plurality of cameras 110A-F determined from the shading information, as further described below in conjunction with FIGS. 2 and 3. The console 130 combines the refined depth information from the multiple images to generate a reconstruction of the object located at the target location 120.

Generating a combined image of an object from images captured by multiple cameras

FIG. 2 is a flow diagram of one embodiment of a method for generating a representation of an object from images of the object captured by a plurality of cameras 110A-F that are differently positioned relative to each other. In various embodiments, the method may include steps different from or additional to those described in conjunction with fig. 2. Additionally, in various embodiments, the method may perform the steps in a different order than described in connection with fig. 2.

The plurality of cameras 110A-F are located at positions relative to each other and are positioned to capture images of objects located at the target location 120. This allows different cameras 110A-F to capture images of different portions of the object located at the target location 120. As further described in conjunction with FIG. 1, each camera 110A-F has one or more parameters that affect the image capture of the cameras 110A-F.

Each of the at least one plurality of cameras 110A-F captures 205 an image of an object located at the target location 120. As described further above in connection with FIG. 1, the positioning of the different cameras 110A-F relative to each other results in the different cameras 110A-F capturing 205 images of different portions of an object located at the target location 120. In various embodiments, each of the plurality of cameras 110A-F captures 205 one or more images of an object located at the target location 120.

The console 130 receives images of objects located at the target locations 120 from the plurality of cameras 110A-F and determines 210 depth information for portions of each received image. The console 130 determines 210 in parallel the depth information for the images received 205 from each camera 110A-F. For example, the console 130 includes multiple processors, such as graphics processing units, and each processor determines 210 depth information for images received 205 from the cameras 110A-F. For example, each processor included in the console 130 corresponds to a camera 110A-F, so the processor determines 210 the depth information for the image received 205 from the camera 110A-F corresponding to that processor. In various embodiments, the console 130 receives images from the multiple cameras 110A-F at a common time, and the console 130 determines 210 depth information for the images received from the multiple cameras 110A-F at the common time in parallel. Determining depth information for images received from multiple cameras 110A-F in parallel allows the console 130 to determine depth information for multiple images more quickly. In various embodiments, the console 130 initializes a random depth for each pixel in the images received from each camera 110A-F at a common time and defines a nearest neighbor field of pixels that are within a threshold distance of a particular pixel in the images received from the cameras 110A-F. The depths of pixels in the images received from cameras 110A-F that are determined to have at least a threshold accuracy are then propagated to neighboring pixels. Subsequently, the candidate depth is evaluated, and the depth of the pixel is modified to the candidate depth, thereby improving the accuracy. In various embodiments, the foregoing steps are iteratively applied to the images received from cameras 110A-F. For example, the console iteratively performs the previous steps a set number of times (e.g., 4 times) for each image. In various embodiments, the console 130 applies a coarse-to-fine tile matching process to the images received from the multiple cameras 110A-F to determine 210 depth information corresponding to different pixels in each image. Applying the tile matching process in parallel to images captured by different cameras 110A-F allows the console 130 to more efficiently obtain depth information from multiple images.

To improve the accuracy of the determined depth information, the console 130 modifies 215 the depth information determined 210 for the images received 205 from each camera 110A-F. In various embodiments, the console 130 modifies 215 the depth information determined 210 for the various images by optimizing an energy function based on the intensity from the stereo information and the intensity of the shadow information from the various images. For example, for images received from multiple cameras 110A-F (e.g., from each camera 110A-F) at a common time, the console 130 determines a global intensity of a portion of the images corresponding to the common depth based on the depth information determined 210 for the different images. For example, the console 130 determines the global intensity of portions of images received from different cameras 110A-F as the average intensity of portions of images received from different cameras 110A-F at a common time that have common depth information or depth information within a threshold amount of each other. In addition, the console 130 determines the intensity of different portions of the images received from the cameras 110A-F at a common time based on the shadow information from the images. The console 130 sums the intensity differences between the global intensities of the different portions of the images received from the different cameras 110A-F at a common time and the intensities calculated from the shading information of the different cameras 110A-F at a common time. The console 130 may also combine the regularization values with depth estimates of portions of the image and depth estimates of corresponding neighboring portions of the image received from the camera; the regularization values take into account the similarity between depth estimates of mutually adjacent portions of the image.

For each image received from a different camera 110A-F at a common time, the console 130 modifies 215 the depth information determined 210 for the images by minimizing the total energy of each image received at the common time. The console 130 minimizes the total energy of energy received at each image at a common time in parallel using any suitable process or processes. For example, the console 130 applies gauss-newton to images received from each camera 110A-F at a common time via a Graphics Processing Unit (GPU) to minimize the total energy of each image received at the common time, which modifies 215 the depth information determined 210 for the images received from the plurality of cameras 110A-F. However, in other embodiments, the console 130 may apply any suitable one or more processes to the total energy determined for images received from different cameras 110A-F at a common time that minimizes the total energy of each received image to modify 215 the depth information determined 210 for each image received at the common time.

From the modified depth information of the images received from the multiple cameras 110A-F at the common time, the console 130 generates 220A representation of the object located at the target location at the common time. In various embodiments, the console 130 applies any suitable method or methods to the modified depth information for each image received at the common time that combine the modified depth information in each image into a single three-dimensional representation of the object at the common time. For example, the console 130 performs Poisson reconstruction from the modified depth information for each image received at the common time to generate 220 a representation of the object at the common time. The console 130 may then render the representation of the object via the display device, store the representation of the object, send the representation of the object to the client device for rendering, or perform any other suitable interaction with the representation of the object. In various embodiments, the console 130 generates 220 multiple representations of the object based on the modified depth information for the images received from the multiple cameras 110A-F at different common times and maintains a series of representations of the object. For example, the series of representations correspond to the appearance of the object during a time interval, which allows the series of representations to depict changes in the appearance of the object during the time interval or to depict movement of the object during the time interval. In various embodiments, the series of representations of the object may be subsequently displayed via a display device, stored by the console 130, or transmitted from the console 130 to a client device. However, in various embodiments, the console 130 may perform any suitable action with respect to the series of representations of the objects.

FIG. 3 is a process flow diagram for generating a representation of an object from images of the object captured by a plurality of cameras 110A-F that are positioned differently relative to each other. As described further above in conjunction with FIGS. 1 and 2, the plurality of cameras 110A-F capture images 305A-F of an object located at target location 120. Different cameras 110A-F capture images 305A-F of different portions of an object located at the target location 120. The console 130 receives images 305A-F from the multiple cameras 110A-F and determines 310 depth information from each image 305A-F in parallel. As further described above in connection with FIG. 2, the console 130 may perform any suitable method or methods to determine 310 depth information from the images 305A-F. In various embodiments, the console 130 includes multiple processors (e.g., graphics processing units), and each processor determines 310 depth information from different received images in parallel. For example, each image 305A-F includes a timestamp, and the console identifies images 305A-F having a common timestamp; different processors of the console 130 process different images 305A-F having a common timestamp in parallel to determine 310 depth information for the different images having the common timestamp in parallel.

The console 130 also modifies 315 the depth information determined 310 from each image 305A-F in parallel. In various embodiments, the console 130 maintains an energy function that determines the energy of each image 305A-F based on the intensity information of the portions of the images 305A-F having a particular timestamp and the intensity information of the portions of the other images 305A-F having a particular timestamp. An example of modifying 315 the depth information determined 310 from each image 305A-F is further described above in connection with FIG. 2. The different processors included in the console 130 modify 315 in parallel the depth information 310 determined from each image 305A-F; for example, each processor modifies 315 the depth information determined 310 from the different images 305A-F. By determining 310 and modifying 315 depth information from multiple images 305A-F in parallel, the console 130 more quickly and efficiently obtains and refines depth information from the various images 305A-F.

Using the modified depth information from the plurality of images 305A-F, the console 130 generates 320 a representation of the objects included in the captured images 305A-F. In various embodiments, the representation is a three-dimensional representation of objects included in the images 305A-F that is generated 320 by combining the images 305A-F and the modified depth information of the images 305A-F. As the depth information for the multiple images is determined and modified in parallel, the total time for the console 130 to generate 320 a representation of the object is reduced, while the parallel modification of the depth information for the multiple images increases the accuracy of the representation.

Conclusion

The foregoing description of embodiments of the present disclosure has been presented for purposes of illustration; it is not intended to be exhaustive or to limit the patent claims to the precise form disclosed. One skilled in the relevant art will recognize that many modifications and variations are possible in light of the above disclosure.

Embodiments of the present disclosure may include or be implemented in conjunction with an artificial reality system. Artificial reality is a form of reality that has been adjusted in some way before being presented to the user, and may include, for example, Virtual Reality (VR), Augmented Reality (AR), Mixed Reality (MR), hybrid reality (hybrid reality), or some combination and/or derivative thereof. The artificial reality content may include fully generated content or generated content combined with captured (e.g., real world) content. The artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of which may be presented in a single channel or in multiple channels (e.g., stereoscopic video that produces a three-dimensional effect to a viewer). Further, in some embodiments, the artificial reality may also be associated with an application, product, accessory, service, or some combination thereof, that is used, for example, to create content in the artificial reality and/or otherwise used in the artificial reality (e.g., to perform an activity in the artificial reality). An artificial reality system that provides artificial reality content may be implemented on a variety of platforms, including a Head Mounted Display (HMD) connected to a host computer system, a standalone HMD, a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.

Some portions of the present description describe embodiments in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Moreover, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combination thereof.

Any of the steps, operations, or processes described herein may be performed or implemented using one or more hardware or software modules, alone or in combination with other devices. In one embodiment, the software modules are implemented using a computer program product comprising a computer-readable medium containing computer program code, which may be executed by a computer processor, for performing any or all of the steps, operations, or processes described.

Embodiments may also relate to an apparatus for performing the operations herein. The apparatus may be specially constructed for the required purposes, and/or it may comprise a general purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory, tangible computer readable storage medium, or any type of medium suitable for storing electronic instructions, which may be coupled to a computer system bus. Moreover, any computing system referred to in the specification may include a single processor, or may be an architecture that employs a multi-processor design to increase computing power.

Embodiments may also relate to products produced by the computing processes described herein. Such products may include information produced by a computing process, where the information is stored on a non-transitory, tangible computer-readable medium and may include any embodiment of a computer program product or other combination of data described herein.

Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the patent rights. It is therefore intended that the scope of the patent claims be limited not by this detailed description, but rather by any claims issued on an application based hereon. Accordingly, the disclosure of the embodiments is intended to be illustrative, but not limiting, of the scope of the patent, which is set forth in the following claims.

16页详细技术资料下载

Generating representations of objects from depth information determined in parallel from images captured by multiple cameras

相关技术

网友询问留言