System and method for active stereo depth sensing

文档序号：1525408 发布日期：2020-02-11 浏览：9次中文

阅读说明：本技术 用于主动立体深度感测的系统和方法 (System and method for active stereo depth sensing ) 是由阿达尔什·普拉卡什·穆尔蒂·寇德莱弗拉迪米尔·坦科维奇唐丹航塞姆·克斯金乔纳森·詹姆于 2018-07-25 设计创作，主要内容包括：一种电子装置[100]基于由具有彼此偏移的曝光时间的深度相机[114、115]结合将照明图案[305、310]脉冲到环境中的照明器[118、1119]捕获到的立体深度图像[410、415]来估计环境的深度图。该电子装置的处理器[220]将来自相机的深度图像的小的切面[430、432]与彼此匹配并且与紧接在深度图像之前的对应补片(例如,空间-时间图像补片“立方体”)匹配。处理器通过将每个空间-时间图像补片转换成二进制代码并且将两个立体图像补片之间的成本函数定义为二进制代码之间的差别来计算每个空间-时间图像补片立方体的匹配成本。处理器使匹配成本最小化以生成视差图,并且通过使用利用经学习的像素偏移的决策树来拒绝离群值并且细化子像素来优化视差图以生成环境的深度图。(An electronic device [100] estimates a depth map of an environment based on stereo depth images [410, 415] captured by depth cameras [114, 115] having exposure times that are offset from each other in conjunction with illuminators [118, 1119] that pulse illumination patterns [305, 310] into the environment. The processor [220] of the electronic device matches the small slices [430, 432] of the depth image from the camera to each other and to corresponding patches (e.g., spatio-temporal image patch "cubes") immediately preceding the depth image. The processor calculates a matching cost for each spatio-temporal image patch cube by converting each spatio-temporal image patch into binary code and defining a cost function between the two stereoscopic image patches as a difference between the binary codes. The processor minimizes matching costs to generate a disparity map, and optimizes the disparity map by rejecting outliers and refining sub-pixels using a decision tree that utilizes learned pixel offsets to generate a depth map of the environment.)

1. A method, comprising:

projecting a first illumination pattern [100] into an environment of an electronic device [305] at an illuminator [119] of the electronic device during a first time, and projecting a second illumination pattern [310] into the environment of the electronic device during a second time;

capturing a first depth image [410] of the environment at a depth camera [114] of the electronic device during the first time and a second depth image [415] of the environment during the second time;

calculating, at a processor [220] of the electronic device, matching costs for corresponding portions of the first depth image, the second depth image, and one or more depth images captured immediately prior to capturing the first depth image, wherein each of the corresponding portions comprises a plurality of pixels of the depth image; and

identifying the corresponding portions of the first and second depth images based on the matching costs to generate a disparity map indicating disparities between pixels of the corresponding portions of the first and second depth images; and

estimating a depth map of the environment based on the disparity map.

2. The method of claim 1, wherein the second illumination pattern is different from the first illumination pattern.

3. The method of claim 1, further comprising: rotating the second illumination pattern relative to the first illumination pattern.

4. The method of claim 1, wherein the illuminator comprises a Vertical Cavity Surface Emitting Laser (VCSEL).

5. The method of claim 1, further comprising: temporally varying the first illumination pattern and the second illumination pattern.

6. The method of claim 1, further comprising: refining the disparity in the disparity map to estimate the depth map by identifying a probability that a pixel is valid and rejecting pixels having an identified probability of validity that is below a threshold.

7. The method of claim 6, wherein identifying a probability that a pixel is valid comprises: identifying the pixel as valid based on a decision tree that sparsely samples spatially neighboring pixels of the pixel.

8. The method of claim 1, further comprising: refining the disparity in the corresponding portion based on the disparity that fits a parabola to each pixel.

9. A method, comprising:

pulsing a first illumination pattern [305] at a first illuminator [119] of an electronic device [100] into an environment of the electronic device with a first phase and a first frequency;

pulsing a second illumination pattern [310] at a second phase and the first frequency into the environment of the electronic device at a second illuminator [118] of the electronic device;

capturing, at a first depth camera [114] of the electronic device, a first series of depth images of the environment at the first phase and the first frequency;

capturing a second series of depth images of the environment at the second phase and the first frequency at a second depth camera [116] of the electronic device;

comparing, at a processor [220] of the electronic device, a first patch [430] of a first depth image [410] of the first series of depth images to a second patch [432] of a second depth image [415] of the second series of depth images and at least one patch of each of one or more depth images captured immediately prior to the first depth image, wherein each patch comprises a plurality of pixels;

calculating a cost function for the compared patches;

generating a disparity map based on the cost function, the disparity map indicating disparities between corresponding pixels of corresponding patches of the first depth image and the second depth image; and

refining the disparity of the disparity map to generate an estimated depth map of the environment.

10. The method of claim 9, further comprising: temporally varying the first illumination pattern and the second illumination pattern.

11. The method of claim 9, further comprising: rotating the second illumination pattern relative to the first illumination pattern.

12. The method of claim 9, further comprising: refining the disparity in the corresponding pixel by identifying a probability that a pixel is valid and rejecting pixels having an identified probability of validity that is below a threshold.

13. The method of claim 9, wherein identifying the probability that a pixel is invalid is based on a decision tree that sparsely samples spatially neighboring pixels of the pixel.

14. An electronic device [100], comprising:

a first illuminator [119] for projecting a first illumination pattern [305] into an environment at a first time;

a second illuminator [118] for projecting a second illumination pattern [310] into the environment at a second time different from the first time;

a first depth camera [114] for capturing a first depth image [410] of the environment at the first time;

a second depth camera [116] for capturing a second depth image [415] of the environment at the second time;

a processor [220] for:

calculating matching costs for corresponding portions of the first depth image, the second depth image, and one or more depth images immediately preceding the first depth image, wherein each of the corresponding portions comprises a plurality of pixels of the depth image;

identifying the corresponding portions of the first and second depth images that minimize the matching cost to generate a disparity map that indicates disparities between pixels of the corresponding portions of the first and second depth images; and

refining the disparity of the disparity map to generate an estimated depth map of the environment.

15. The electronic device of claim 14, wherein the second illumination pattern is different from the first illumination pattern.

16. The electronic device of claim 14, wherein the second illumination pattern is rotated relative to the first illumination pattern.

17. The electronic device of claim 14, wherein the first and second illuminators are configured to temporally change the first and second illumination patterns.

18. The electronic device of claim 14, wherein the processor is to refine the disparity in the corresponding portion by identifying a probability that a pixel is valid and rejecting pixels having an identified probability of validity that is below a threshold.

19. The electronic device of claim 18, wherein the processor is configured to identify a probability that a pixel is invalid based on a decision tree that sparsely samples spatially neighboring ones of the pixels.

20. The electronic device of claim 14, wherein the processor is to refine the disparity in the corresponding portion based on fitting a parabola to the disparity of each pixel.

Background

The depth camera is used as an input for computer vision tasks such as hand, body or object tracking, 3D reconstruction and simultaneous localization and mapping (SLAM). For these tasks, each new frame of depth and image data is correlated with the previous frame, allowing for pose or geometric reconstruction over time. However, for depth cameras that operate at relatively low speeds (i.e., capture a low number of frames per second), high frame-to-frame movement in the scene and artifacts such as motion blur make it difficult to resolve the correlation between frames.

Drawings

The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference symbols in different drawings indicates similar or identical items.

Fig. 1 is a schematic diagram illustrating an electronic device that estimates a depth map of an environment using an active stereo depth camera, in accordance with some embodiments.

Fig. 2 is a block diagram of the electronic device of fig. 1, in accordance with some embodiments.

Fig. 3 is a schematic diagram illustrating an illuminator of an electronic device projecting two illumination patterns alternately into an environment, in accordance with some embodiments.

FIG. 4 is a schematic diagram illustrating an electronic device that matches patches for each of a depth image from a first depth camera, a depth image from a second depth camera, and a previous depth image, in accordance with some embodiments.

Fig. 5 is a flow diagram illustrating a method of estimating a depth map based on a captured depth image, in accordance with some embodiments.

Detailed Description

The following description is intended to convey a thorough understanding of the present disclosure by providing several specific embodiments and details relating to estimating a depth map of an environment based on alternating stereoscopic depth images. It is to be understood, however, that the present disclosure is not limited to these specific embodiments and details, which are exemplary only, and that the scope of the present disclosure is, therefore, intended to be limited only by the appended claims and equivalents thereof. It is further understood that one possessing ordinary skill in the art, in view of known systems and methods, would appreciate the use of the present disclosure for its intended purposes and benefits in any number of alternative embodiments, depending upon specific design and other needs.

1-5 illustrate techniques for estimating, by an electronic device, a depth map of an environment based on stereoscopic depth images captured by depth cameras having exposure times offset from one another in conjunction with illuminators that pulse illumination patterns into the environment to support location-based functionality, such as Augmented Reality (AR) functionality, Virtual Reality (VR) functionality, visual positioning/odometry or other simultaneous positioning and mapping (SLAM) functionality, and so forth. The first illuminator pulses a first illumination pattern into the environment at a first frequency and phase, and the second illuminator pulses a second illumination pattern into the environment at the first frequency and second phase. The first depth camera captures a depth image of the environment during a time when the first illuminator is pulsing the first illumination pattern, and the second depth camera captures a depth image of the environment during a time when the second illuminator is pulsing the second illumination pattern. In some embodiments, the electronic device dynamically changes the projected pattern over time.

A processor of the electronic device matches small sections (sections) of the depth images from the first and second cameras, referred to as patches, to each other and to corresponding patches (e.g., spatio-temporal image patches "cubes") of one or more immediately preceding depth images of the environment. The processor calculates a matching cost for each spatio-temporal image patch cube by converting each spatio-temporal image patch into binary code and defining a cost function between the two stereo (left and right) image patches as the difference between the binary codes. The processor minimizes the matching cost to generate a disparity map. The processor optimizes the disparity map to generate a depth map of the environment by using a decision tree that utilizes learned pixel offsets to identify and reject outliers and refine sub-pixels. By utilizing the relatively fast frame rate of the depth camera to include the previous depth image when calculating the matching cost of stereoscopic depth imaging, the electronics reduce noise in the matching while allowing for smaller spatial windows (patches), which results in better performance along the depth discontinuity. Furthermore, by varying the projected pattern over time, the electronics minimize the effect of the deviation from stereo matching. By using the decision tree to identify and reject outliers, the electronics reduce the computational cost consumed by cross-checking and separate the computation from the resolution of the image.

Fig. 1 illustrates an electronic device 100 configured to support location-based functionality, such as SLAM, VR, or AR, using depth image data in accordance with at least one embodiment of the present disclosure. The electronic device 100 may comprise a user-portable mobile device such as a tablet computer, a computing-enabled cellular telephone (e.g., "smartphone"), a notebook computer, a Personal Digital Assistant (PDA), a gaming system remote control, a television remote control, and so forth. In other embodiments, the electronic device 100 may comprise another type of mobile device, such as a head mounted display, a single camera, a multi-sensor camera, and so forth. For ease of illustration, the electronic device 100 is generally described herein in the example context of a mobile device, such as a tablet computer or smartphone; however, the electronic device 100 is not limited to these example embodiments.

In the depicted example, the electronic device 100 includes a plurality of sensors to obtain information about the local environment 112 of the electronic device 100. The electronic device 100 obtains visual information (imagery) of the local environment 112 via a color (RGB) imaging camera 102 and depth cameras 114 and 116. In one embodiment, imaging camera 102 is implemented as a wide-angle imaging camera with a fisheye lens or other wide-angle lens to provide a wide-angle view of local environment 112. In one embodiment, depth camera 114 (also referred to as a left depth camera) projects a first modulated light pattern into the local environment using modulated light illuminator 119 (also referred to as left illuminator 119), and captures reflections of the first modulated light pattern as it reflects back from objects in local environment 112. In one embodiment, the depth camera 116 (also referred to as the right depth camera 116) projects a second modulated light pattern into the local environment using a modulated light illuminator 118 (also referred to as the right illuminator 118) and captures reflections of the second modulated light pattern as it reflects back from objects in the local environment. In some embodiments, depth cameras 114 and 116 are implemented as a pair of monochrome Infrared (IR) cameras with bandpass filters. Although the depth cameras 114 and 116 are referred to as left and right cameras in the example embodiment of fig. 1, it should be understood that in other embodiments, the cameras may be in different configurations and arrangements. It will be further understood that both cameras may capture images of the same environment simultaneously.

In some embodiments, each of the left illuminator 119 and the right illuminator 118 emits Infrared (IR) light. In some embodiments, each of the left illuminator 119 and the right illuminator 118 is a Vertical Cavity Surface Emitting Laser (VCSEL). VCSELs emit light from a larger surface than lasers and therefore emit more light while still being safe to the eye. In some embodiments, the left illuminator 119 and the right illuminator 118 are coupled with a suitable mask (not shown) to emit structured light (i.e., a modulated light pattern). In some embodiments, the modulated light patterns are temporally modulated light patterns. The reflection of the captured modulated light pattern is referred to herein as a "depth image". Then, a processor (not shown) of the electronic device 100 may calculate the depth of the object, i.e., the distance of the object from the electronic device 100, based on the analysis of the depth image.

In operation, the left illuminator 119 pulses a first illumination pattern into the environment 112 at a first frequency and a first phase, while the right illuminator 118 pulses a second illumination pattern into the environment at the first frequency and a second phase to minimize interference between the first and second illumination patterns. For example, if each of the left and right depth cameras 114 and 116 has an exposure time of 2ms and is running at 210 frames per second (fps), and each of the left and right illuminators 119 and 118 pulses its respective illumination pattern into the environment for 2ms pulses synchronized with the exposure time of the left and right depth cameras 114 and 116, respectively, there will be a 4.75ms interval between two consecutive frames. Thus, the exposures of the left and right depth cameras 114, 116 are offset in time so that they do not interfere with each other even if the depth cameras are facing toward each other and while maintaining a frame rate of 210 fps. In some embodiments, the first phase and the second phase of the pulse are dynamically adjustable. In some embodiments, each of the first and second illumination patterns is a regular grid of points, and the left and right illuminators 119 and 118 are rotated relative to each other such that the combination of the two illumination patterns produces a locally unique pattern. In some embodiments, electronic device 100 includes additional illuminators, each mounted at a slightly different angle. A processor (not shown) activates a different subset of illuminators at each frame in left and right depth cameras 114, 116 to generate a changing pattern over time.

The electronic device 100 generates depth data based on the detection of spatial features in the image data captured by the depth cameras 114 and 116. To illustrate, in the depicted example of fig. 1, local environment 112 comprises a hallway of an office building that includes three corners 124, 126, and 128, a baseboard 130, and an electrical outlet 132. In this example, the depth camera 114 captures depth data 136 based on reflections of the first modulated light pattern projected by the illuminator 119 as it reflects off objects in the local environment 112, and the depth camera 116 captures depth data 138 based on reflections of the second modulated light pattern projected by the illuminator 118 as it reflects off objects in the local environment 112. In some embodiments, the electronics train or calibrate a processor (not shown) based on the image 140 of the local environment 112 captured by the RGB camera 102.

The processor (not shown) of the electronic device 100 is respectively denoted as I via pairs _LAnd I _RTo estimate pairs in the environment 112 from triangulation of corresponding points identified in the depth image 136 from the left depth camera 114 and the depth image 138 from the right depth image 116The depth of the image. For this purpose, the processor is a left image I _LEach pixel p in (1) _LFind it in the right image I (x, y) _RCorresponding pixel p in (1) _R(x ', y'). Assuming a stereo system to be calibrated and corrected, for each matching pair p _LAnd p _RAnd y is equal to y'. The displacement d-x' is called parallax. Depth values given a disparity value d for a given pixel

Inversely proportional to d. The quantity b is the baseline of the stereo system and f is the focal length.

The processor calculates a matching cost defining a distance or similarity function (also referred to as a correlation function) between the patches (smaller sections) of the depth image 136 and the depth image 138. The processor uses the correlation function to find the best disparity according to a certain criterion, such as the lowest distance. In some embodiments, the processor refines the disparity to sub-pixel accuracy and rejects outliers to generate a depth map of the environment 112, as explained further below.

Fig. 2 is a block diagram of the electronic device 100 of fig. 1, according to some embodiments. The electronic device 100 includes a depth camera controller 205 for controlling the left depth camera 114 and the right depth camera 116, an illuminator controller 210 for controlling the left illuminator 119 and the right illuminator 118, and a processor 220. The processor 220 includes a matching cost calculator 225, a disparity optimizer 230, a sub-pixel refiner 235, an outlier identifier 240, and a depth map generator 245.

The depth camera controller 205 is a module configured to control the activation and exposure times of the left and right depth cameras 114, 116. The depth camera controller 205 adjusts the frame rate, exposure time, and phase of the left depth camera 114 and the right depth camera 116. In some embodiments, the depth camera controller 205 ensures that the left depth camera 114 and the right depth camera 116 have non-overlapping exposure times. In some embodiments, depth camera controller 205, along with illuminator controller 210, coordinates the frame rate, exposure time, and phase of left depth camera 114 and right depth camera 116.

The illuminator controller 210 is a module configured to control the activation and pulse duration of the left illuminator 119 and the right illuminator 118 and the illumination pattern projected by the left illuminator 119 and the right illuminator 118. The illuminator controller 210 activates the left illuminator 119 to pulse the first illumination pattern into the environment at a frequency and phase matching the frequency and phase of the left depth camera 114, and activates the right illuminator 118 to pulse the second illumination pattern into the environment at a frequency and phase matching the frequency and phase of the right depth camera 116. Thus, during the time that the left illuminator 119 pulses the first illumination pattern into the environment, the left depth camera 114 captures a depth image, and during the time that the right illuminator 118 pulses the second illumination pattern into the environment, the right depth camera 114 captures a depth image. In some embodiments, the time that the left illuminator 119 pulses the first illumination pattern and the time that the right illuminator 118 pulses the second illumination pattern are non-overlapping.

Processor 220 is configured to receive depth images (not shown) from left depth camera 114 (left image) and right depth camera 116 (right image). In some embodiments, the processor is further configured to receive an image from an RGB camera (not shown). The matching cost calculator 225 is a module configured to calculate matching costs of patches (slices) of the left image frame and the right image frame. The patch size must be large enough to uniquely identify the pixel based on the texture (from the illumination pattern) in its surrounding area. Given an image patch X in a left image of size n _LAnd image patch X in the right image _RThe matching cost calculator 225 calculates the matching cost based on its appearance irrespective of the patch (window) size n. The matching cost calculator 225 defines a function b sig (xw) that uses k hyperplanes W e R ^n×kRepresenting b e {0, 1} in binary ^kRemap each image patch x. To have an O (1) mapping independent of the signal dimension n, the matching cost calculator 225 ensures that the hyperplane W is sparse. Sparsity causes the matching cost calculator 225 to only have to access a small subset of pixels inside each patch, which reduces computation and memory access. The matching cost calculator 225 learns the binary mapping signal sign (xW) that retains as much of the original signal X as possible.

In some embodiments, the matching cost calculator 225 calculates an inverse linear mapping Z that reconstructs the original space X from the binary code b. Thus, the matching cost calculator 225 learns the set W ∈ R of sparse hyperplanes ^n×kAnd the inverse mapping Z ∈ R minimizing the equation ^k×n。

Wherein X ∈ R ^m×nIs a matrix of training examples. The matching cost calculator 225 uses To induce sparsity in the hyperplane W and thereby make the linear mapping independent of the patch dimension n. In some embodiments, the matching cost calculator 225 optimizes the equations using alternating minimization.

The matching cost calculator 225 extends the linear mapping to spatio-temporal patches based on one or more depth images captured immediately prior to the capture of the left and right images. Given the high frame rate of the high speed depth cameras 114, 116, the matching cost calculator 225 assumes that there is little motion between subsequent image frames at time t and time t + 1. Based on the assumed small amount of motion from one frame to the next, matching cost calculator 225 uses a straight-line spatio-temporal image volume x (as shown in depth image 350 of fig. 3) with a size n-P × F, where P is the spatial window size and F is the temporal buffer of F frames. Because the mapping W is sparse, the mapping is not dependent on the temporal buffer size F or the spatial resolution P. By changing the illumination pattern projected by the left illuminator 119 and the right illuminator 118 over time, the electronic device 100 changes the appearance of the patch over time to ensure that the information added across multiple frames is not redundant. By matching with the space-time window, the matching cost calculator 225 reduces noise in the matching, allows for a smaller space window and eliminates the bias effect.

At runtime, the matching cost calculator 225 converts each spatio-temporal image patch x into 32 binary generations of kCode b sign (xw). The matching cost calculator 225 patches the two images x ^LAnd x ^RThe cost function between is defined as code b ^LAnd b ^RHamming distance between. The matching cost calculator 225 obtains a calculation at O (1) and the calculation is independent of the patch size n.

The disparity optimizer 230 is a module configured to identify image patches of the left and right image frames having the lowest matching costs to generate a disparity map indicating disparities between pixels of the patches of the left and right image frames. In some embodiments, all possible disparity labels d need not be evaluated in order to find the image patch with the lowest matching cost _kThe disparity optimizer 230 initializes the depth image by testing the random disparity of each pixel and selecting the disparity with the smallest hamming distance in binary space. For example, in some embodiments, the disparity optimizer 230 tests 32 random disparities for each pixel. Thus, for having the current lowest disparity d _iPixel p of _iThe disparity optimizer 230 tests the 3 × 3 neighborhood

And selects the one with the best cost. The disparity optimizer 230 defines the cost function as:

wherein the content of the first and second substances,

is the hamming distance between the code at the pixel p in the left image and the code computed at the position p + d in the right image, where the pixel p is defined by its x component only and p + d is the shift along this dimension. The disparity optimizer 230 uses the term S (d) _k，d)＝max(τ，|d _kD | to enhance the smoothness between neighboring pixels. In some embodiments, the disparity optimizer 230 takes into account a small local neighborhood

So that the cost function equations can be easily solved by enumerating all possible solutions in a 3 x 3 window and selecting the best one. In some embodiments, the disparity optimizer 230 repeats the iterative optimization multiple times until convergence is reached. The disparity optimizer 230 generates a disparity map (not shown) based on the lowest cost calculated for each pixel.

In some embodiments, the disparity optimizer 230 further utilizes the high frame rate data in an initialization step. For each pixel p at time t, the disparity optimizer 230 tests the previous disparity of the pixel at time t-1. If the Hamming distance is less than all random disparities, then the disparity optimizer 230 uses the previous values to initialize the iterative optimization. Given a 210fps depth camera, many of the pixels will typically have the same disparity between two consecutive frames.

Subpixel refiner 235 is a module configured to achieve subpixel accuracy using parabolic interpolation. Given a pixel p with disparity d, the sub-pixel refiner 235 fits a parabola by considering the disparities d-1 and d + 1. The sub-pixel refiner 235 calculates the hamming distances of the binary codes for disparities d, d-1 and d +1 and fits a quadratic function. The sub-pixel refiner 235 selects the best disparity d at the global minimum of the quadratic function ^★As the optimum value of d. In some embodiments, the sub-pixel refiner 235 repeats the parabolic fit at the end of each iteration of the optimization performed by the disparity optimizer 230 and for each pixel.

Outlier identifier 240 is a module configured to identify and remove invalid pixels directly from the data. The outlier recognizer 240 is trained by cross-checking a set of disparity maps of the environment against RGB images of the environment and computing a weighted median. The outlier identifier 240 aligns and calibrates the left depth camera 114 and the right depth camera 116 against an RGB camera (not shown). The outlier identifier 240 marks each pixel as "valid" or "invalid" based on cross-checking the depth image against the RGB image and a weighted median filter. The outlier identifier 240 then learns the function of deciding to invalidate or accept the given disparity. In some embodiments, to keep computation small and independent of image resolution, the outlier identifier 240 uses a decision tree to determine pixel significance.

The outlier identifier 240 populates the nodes in the decision tree using the two learned pixel offsets u ═ (Δ x, Δ y) and v ═ Δ x ', Δ y') and the threshold τ. When evaluating the pixel at location p ═ x, y, the decision tree of the outlier identifier 240 decides where to route the particular example to based on the sign of I (p + u) -I (p + v) > τ, where I (p) is the intensity value of pixel p. In some embodiments, at training time, the outlier identifier 240 samples 500 possible partition parameters δ ═ (u, v, τ) for the current node. Each δ causes a division of the set S of data into left S _L(delta) and right S _R(δ) subset (child set). The outlier identifier 240 selects a set of parameters δ that maximize the Information Gain (Information Gain), which is defined as:

where entropy e (S) is the shannon entropy of the empirical distribution p (valid | S) of the class label "valid" in S. Each leaf node contains a probability p (valid | p, I) and when the quantity is less than 0.5, the outlier identifier 240 invalidates the pixel.

The depth map generator 245 is a module configured to generate a three-dimensional (3D) point cloud (referred to as a depth map) for each image frame pair from the left and right depth cameras 114, 116 based on the disparity map generated by the disparity optimizer 230. In some embodiments, depth map generator 245 further bases the depth map on the sub-pixel refinements identified by sub-pixel refiner 235. In some embodiments, the depth map generator 245 additionally bases the depth map on the validity determination made by the outlier identifier 240. The depth map can be used as an input to efficient, low-latency, high-quality computer vision algorithms, including scene and object scanning, non-rigid tracking, and hand tracking.

FIG. 3 illustrates luminaires 118 and 119 of electronic device 100 alternately projecting two illumination patterns 305 and 310 into environment 112, in accordance with some embodiments. In some embodiments, the illumination patterns 305, 310 are regular grids of dots rotated relative to each other such that their combination produces a locally unique pattern. The left illuminator 119 pulses a first illumination pattern 305 into the environment 112 at a first frequency and a first phase, while the right illuminator 118 pulses a second illumination pattern 310 into the environment 112 at the first frequency and a second phase offset from the first phase. Thus, the left illuminator 119 pulses the illumination pattern 305 during a first time t, while the right illuminator 118 pulses the illumination pattern 310 during a second time t + 1. In some embodiments, a luminaire controller (not shown) varies the first illumination pattern and the second illumination pattern over time to minimize depth deviations from the reflective pattern.

A depth camera controller (not shown) activates the left depth camera 114 in conjunction with the pulsing of the left illuminator 119 and activates the right depth camera 116 in conjunction with the pulsing of the right illuminator 118. Thus, in some embodiments, the depth camera controller activates the left depth camera 114 to capture depth images during time t and activates the right depth camera 116 to capture depth images during time t +1 to produce a set of depth images 350. By alternately pulsing the left and right illuminators 118, 119 and alternately activating the left and right depth cameras 114, 116, the electronic device 100 avoids interference between the illuminators 118, 119 and the depth cameras 114, 116. In some embodiments, the depth camera controller and illuminator controller adjust the phases of the illuminators 118, 119 and depth cameras 114, 116 to minimize interference.

FIG. 4 is a schematic diagram illustrating the matching cost calculator 225 of FIG. 2 matching out patches 430, 432, 434 from each of a depth image 410 from a first depth camera, a depth image 415 from a second depth camera, and a previous depth image 420, in accordance with some embodiments. In the illustrated example, each of the depth images 410, 415, 420 illustrates a ball 405 that rolls along a hallway toward the depth camera. The matching cost calculator 225 calculates a binary descriptor (code) for each pixel from the spatio-temporal neighborhood within the patches 430, 432, and 434, and defines the cost of pixels in the depth images 410 and 415 originating from the same scene point as the hamming distance of the binary code.

Fig. 5 is a flow diagram illustrating a method 500 of estimating a depth map based on a captured depth image, in accordance with some embodiments. In block 502, the processor 220 of the electronic device 100 receives a left depth image, a right depth image, and a depth image captured immediately prior to capturing the left and right depth images. In block 504, the matching cost calculator 225 calculates a matching cost for each patch of the left and right depth images. In block 506, the disparity optimizer minimizes the matching cost to generate a disparity map. In block 508, the sub-pixel optimizer 235 refines the sub-pixel precision using parabolic interpolation, and the outlier identifier 240 identifies invalid pixels and removes the invalid pixels from the disparity map. In block 510, the depth map generator 245 generates a 3D point cloud based on the refined disparity map.

In some embodiments, certain aspects of the techniques described above may be implemented by one or more processors of a processing system executing software. The software includes one or more sets of executable instructions stored or tangibly embodied on a non-transitory computer readable storage medium. The software may include instructions and certain data that, when executed by one or more processors, manipulate the one or more processors to perform one or more aspects of the techniques described above. The non-transitory computer-readable storage medium may include, for example, a magnetic or optical disk storage device, a solid state storage device such as flash memory, a cache, Random Access Memory (RAM) or other non-volatile storage device or devices, and so forth. Executable instructions stored on a non-transitory computer-readable storage medium may be in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executed by one or more processors.

A computer-readable storage medium may include any storage medium or combination of storage media that is accessible by a computer system during use to provide instructions and/or data to the computer system. These storage media may include, but are not limited to, optical media (e.g., Compact Discs (CDs), Digital Versatile Discs (DVDs), blu-ray discs), magnetic media (e.g., floppy disks, tape, or magnetic hard drives), volatile memory (e.g., Random Access Memory (RAM) or cache), non-volatile memory (e.g., Read Only Memory (ROM) or flash memory), or microelectromechanical systems (MEMS) -based storage media. The computer-readable storage medium can be embedded in a computing system (e.g., system RAM or ROM), fixedly attached to a computing system (e.g., a magnetic hard drive), removably attached to a computing system (e.g., a flash memory based on a compact disk or Universal Serial Bus (USB)), or coupled to a computer system via a wired network or a wireless network (e.g., a network accessible storage device (NAS)).

Note that not all of the activities or elements described above in the general description are required, that a portion of a particular activity or apparatus may not be required, and that one or more other activities may be performed or one or more other elements may be included in addition to the elements described. Further, the order in which activities are listed is not necessarily the order in which the activities are performed. Also, the concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.

Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. The benefits, advantages, solutions to problems, and any feature(s) that may cause any benefit, advantage, or solution to occur or become more pronounced, however, are not to be construed as a critical, required, or essential feature or feature of any or all the claims. Moreover, the particular embodiments disclosed above are illustrative only, as the disclosed subject matter may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. No limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope of the disclosed subject matter. Accordingly, the protection sought herein is as set forth in the claims below.

16页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：图像处理装置、图像处理方法和程序

System and method for active stereo depth sensing

相关技术

网友询问留言