Method for detecting occlusion in an image, corresponding device and computer program product

文档序号：723176 发布日期：2021-04-16 浏览：29次中文

阅读说明：本技术 用于检测图像中的遮挡的方法，对应的设备和计算机程序产品 (Method for detecting occlusion in an image, corresponding device and computer program product ) 是由 V.阿利 D.多延 T.兰格洛伊斯于 2019-07-18 设计创作，主要内容包括：提出一种用于检测由光场捕获系统捕获的图像中的遮挡的方法,包括：对于属于由光场捕获系统捕获的图像的矩阵的至少一个参考图像：-计算参考图像中像素的深度图和可见度图；-至少基于深度图的分段来确定可能出现潜在遮挡的参考图像中的至少一个候选区域；-至少基于与至少一个候选区域的至少两个邻域相关联的可见度图的可见度值,来确定表示至少一个候选区域中的遮挡状态的信息。(It is presented a method for detecting occlusions in an image captured by a light field capture system, comprising: for at least one reference image belonging to a matrix of images captured by a light field capture system: -computing a depth map and a visibility map of pixels in the reference image; -determining at least one candidate region in the reference image where potential occlusion may occur based at least on the segmentation of the depth map; -determining information representative of an occlusion state in the at least one candidate region based at least on visibility values of visibility maps associated with at least two neighborhoods of the at least one candidate region.)

1. A method for detecting occlusions in an image, comprising:

determining a depth map and a visibility map for at least one reference image received in a matrix of images;

determining a candidate region in the reference image where occlusion is likely to occur by segmenting the depth map;

determining visibility information associated with occlusion states associated with different portions of the candidate region; wherein the information comprises at least one visibility value associated with two neighborhoods of the candidate regions defined by the visibility map.

2. An apparatus for detecting occlusions in an image;

a light field system having at least one camera for capturing a matrix of images;

a processor configured to determine a depth map and a visibility map for at least one reference image received in a matrix of images;

determining a candidate region in the reference image where occlusion is likely to occur by segmenting the depth map;

3. The method of claim 1 or the device of claim 2, wherein the determining at least one candidate region comprises:

-segmenting a depth range covered by the depth map to deliver at least a first depth interval and a second depth interval having a higher average depth than the first depth interval;

-determining a first set of pixels associated with the image, and a second set of pixels in the reference image, corresponding to the first depth interval and responsive to the second depth interval; wherein the pixels in the first set of pixels are not represented in the second set of pixels.

4. The method according to any one of claims 1 or 3 or the apparatus according to any one of claims 2 or 3, wherein the determining a representation occlusion state comprises: extrapolating visibility values provided by the determined visibility map of at least two first pixels belonging to a first neighbourhood of the at least two neighbourhoods and at least two second pixels belonging to a second neighbourhood such that they form an extrapolation curve extending towards reduced visibility values in the visibility map.

5. Method or device according to claim 4, wherein the determining information is performed at least for a row or a column of the reference image such that a first extrapolation curve and a second extrapolation curve extend with the row or the column in a cutting plane in the visibility map.

6. The method according to claim 5 or the device according to claim 5, wherein the information is further determined for a first extrapolation curve and a second extrapolation curve intersecting each other for negative extrapolated visibility values.

7. The method of claim 6, further comprising:

-determining the intersection of the first and second extrapolation curves with the visibility axis of the void in the cutting plane to convey first and second intersection coordinates; and

-determining a difference between the first and second intersecting coordinates, the magnitude of the difference representing the magnitude of the occlusion, the information further taking into account the magnitude of the difference.

8. The device of claim 6, wherein the processor is further configured to determine intersections of first and second extrapolation curves with a visibility axis of a void in the cutting plane to communicate first and second intersection coordinates; and determining a difference between the first and second intersecting coordinates, the magnitude of the difference representing a magnitude of the occlusion, the information further taking into account the magnitude of the difference.

9. The method of claim 7 or apparatus of claim 8, wherein the act of determining information is performed successively for each row and column of the reference image, each time new information representative of the occlusion state in a candidate region is determined by successive performance of each row and column to convey a set of intermediate information.

10. The method of claim 7 or the apparatus of claim 8, wherein occlusion is determined to occur when at least a percentage of intermediate information in the candidate region is affected based on a given threshold.

11. The method according to any one of claims 1 or 3 to 8 or the apparatus according to any one of claims 2 to 8, further comprising communicating the information to a user of the light field capture system during capture of the reference image.

12. The method of claim 11 or the device of claim 11, wherein the act of communicating information comprises: presenting the reference image on a display of the light field capture system with a particular outline drawn around the at least one candidate region.

13. A method or apparatus according to claim 9 when dependent on claim 8, wherein the thickness of the specific profile is a function of the integrated difference.

14. Computer program product, characterized in that it comprises program code instructions for implementing a method according to at least one of claims 1 or 3 to 7 or 9 to 13 when said program is executed on a computer or processor.

15. A non-transitory computer readable carrier medium storing the computer program product of claim 14.

Technical Field

The present disclosure relates generally to the field of capture of Light Field (LF) images or video.

More particularly, the present disclosure relates to detecting occlusions in images captured by light field capture systems (e.g., camera rigs or arrays, plenoptic cameras, etc.).

The present disclosure may be of interest in any area of LF data processing and/or LF capture that is of interest to both professionals and consumers.

Background

The LF data is as follows:

multi-view video, i.e. in video, where images belonging to an image matrix are simultaneously captured from multiple camera angles, typically using camera equipment (also called camera array); or

Plenoptic video, also called lenslet-based video, i.e. in video where the images belonging to the image matrix are sub-aperture images, which are simultaneously captured from a single camera using a micro-lens array in addition to the main lens system, e.g. a plenoptic camera.

Such LF capture systems are capable of capturing the same scene from different viewpoints, thereby delivering image matrices that are captured simultaneously and each representing a different viewpoint. An interesting way to use these viewpoints is to display the corresponding images with the ability to obtain "parallax". For example, the viewer may indeed see foreground objects due to the way in which the content is navigated, but may also see a part of the background when he selects a different viewpoint based on different images of the image matrix captured by the LF capture system.

The availability of different viewpoints in the LF data leads to an enhanced amount of data that can be used to detect occlusion compared to conventional capture based on only two views. From this perspective, it may be desirable to better detect occlusions when processing LF data than using known techniques.

Therefore, there is a need for a method for detecting occlusions in images captured by an LF capture system that utilizes different viewpoints in the images of an image matrix captured by the LF capture system.

However, during the rendering of LF data, the ability to see one object behind other objects in the background and in the foreground of the scene is still driven by the content of the captured LF data. More specifically, if the object under consideration, which is located in the background during the capturing, is already in the field of view of at least one of the capturing devices of the LF capturing system, its presentation is still possible. Otherwise, no matter which viewpoint is selected, the user cannot see the object under consideration during the rendering, and the object under consideration is said to be occluded.

This occlusion depends on the positioning of the LF capture system relative to the scene during capture. More specifically, if the object under consideration cannot be captured by any of the capturing devices of the LF capture system when the LF capture system is in a given position relative to the scene, the object under consideration may still be captured when the LF capture system is in another position relative to the scene.

Therefore, there is a need for a method for detecting occlusions in images captured by an LF capture system that remains light in terms of computational load, so as to be able to be performed in real time (e.g., notified to a user of the LF capture system during capture of a scene).

Disclosure of Invention

The present disclosure relates to a method for detecting occlusion in a matrix of images captured by a light field capture system, comprising, for at least one reference image belonging to the matrix of images captured by the light field capture system:

-computing a depth map and a visibility map of pixels in the reference image;

-determining at least one candidate region in the reference image where potential occlusion may occur based at least on the segmentation of the depth map;

-determining information representative of an occlusion state in the at least one candidate region based at least on visibility values of visibility maps associated with at least two neighborhoods of the at least one candidate region.

Another aspect of the disclosure relates to a device for detecting occlusions in images captured by a light field capture system, comprising a processor or a dedicated machine configured to, for at least one reference image belonging to a matrix of images captured by the light field capture system:

-computing a depth map and a visibility map of pixels in the reference image;

-determining at least one candidate region in the reference image where potential occlusion may occur based at least on the segmentation of the depth map;

Further, the present disclosure relates to a non-transitory computer readable medium comprising a computer program product recorded thereon and executable by a processor, comprising program code instructions including program code instructions for implementing the method for detecting occlusions in images captured by a light field capture system as described above.

Drawings

Further features and advantages of the embodiments will emerge from the following description, given by way of indicative and non-exhaustive example and derived from the accompanying drawings, in which:

fig. 1 illustrates a user of an LF capture system capturing a scene comprising foreground objects and background objects;

fig. 2a and 3a illustrate two configurations of the scene of fig. 1, in which occlusion occurs or does not occur depending on the distance of the foreground objects with respect to the LF capture system and the background objects;

FIG. 4 illustrates a flow diagram of a method for notifying a user of the LF capture system of FIG. 1 of potential occlusions during scene capture, according to one embodiment;

figures 2b and 2c illustrate respectively a depth map and a visibility map obtained in the configuration of the scene of figure 2a for a reference image captured by the LF capture system of figure 1, when the LF capture system performs the method of figure 4;

figures 2d to 2g illustrate some of the processes involved in the method of figure 4 when performed by the LF capture system of figure 1 in the configuration of the scenario of figure 2 a;

figures 2h and 2i illustrate the presentation of information representing an occlusion state according to two embodiments;

fig. 3b illustrates a visibility map obtained in the configuration of the scene of fig. 3a for a reference image captured by the LF capture system of fig. 1, when the LF capture system in question performs the method of fig. 4;

figures 3c and 3d illustrate some of the processes involved by the method of figure 4 when performed by the LF capture system of figure 1 in the configuration of the scenario of figure 3 a;

FIG. 5 illustrates an exemplary device that can be used to implement the method of FIG. 4.

Detailed Description

Throughout the drawings of this document, like reference numerals refer to like elements and steps.

The disclosed technology relates to a method for detecting occlusions in images captured by an LF capture system.

More specifically, this method comprises determining, based at least on the segmentation of the depth map associated with the reference image, candidate regions in the reference image (belonging to the matrix of images captured by the LF capture system) in which potential occlusions may occur. Information representative of an occlusion state in the candidate region is determined based at least on visibility values associated with at least two neighborhoods of the candidate region in the reference image.

Thus, the determination of information representing the occlusion state makes use of information available in different views of the LF data, leading to improved results compared to known techniques. Furthermore, the determination of the information representing the occlusion state depends only on easily derivable parameters (e.g. depth and visibility) with little additional derivation, so that the method can be easily performed. For example, the method may be used in a real-time environment to inform the user of the LF capture system during the capture of the image in question. Thus, in this particular case, the user can change the position and/or orientation of the LF capture system to avoid occlusion (if any).

We now describe, in connection with fig. 1, a user of the LF capture system 100 during capture of a scene 150 comprising foreground objects 150a and background objects 150 b. Two configurations of the scene 150 are further discussed in conjunction with fig. 2a and 3a, where occlusion occurs or does not occur depending on the distance of the foreground object 150a relative to the LF capture system 100 and the background object 150 b.

When scene 150 is captured at a given time, LF capture system 100 transmits a matrix of images belonging to the LF data, each image in the matrix capturing scene 150 at a different viewpoint. To this end, the LF capture system 100 comprises an optical system 100o, which optical system 100o is specifically dedicated to simultaneously capturing the respective images in the matrix.

In the present embodiment, the LF capture system 100 is a camera device (or camera array), and the optical system 100o includes 4 cameras 100o1 to 100o4 (fig. 2a and 3 a). In other embodiments not illustrated, the LF capture system 100 includes another number of cameras than one. In other embodiments, the LF capture system 100 is a plenoptic camera and the capture devices are a plurality of subsystems, each subsystem comprising a main lens, one microlens of a microlens array, and a corresponding portion of a sensor.

Returning to fig. 1, the LF capture system 100 also includes a display 100d (e.g., screen, touch screen, etc.) that vividly presents an image belonging to a matrix of images transmitted by the LF capture system 100 during capture of the scene 150. In this case, the rendered image is referred to as a reference image in the image matrix. In other embodiments not shown, the LF capture system does not include such a display, and the captured LF data is sent to a remote device or stored directly in the LF capture system for later post-processing. In that case, the reference image is a given image selected from the images of the image matrix captured by the LF capture system, and the disclosed method is performed in post-processing.

Returning to fig. 1, the LF capture system 100 includes a device 110, the device 110 including means for performing the method of fig. 4 (a method for detecting occlusions in images captured by the LF capture system 100). Such a device is discussed further below in conjunction with fig. 5.

In a first configuration of the scene 150 (fig. 2a), the foreground object 150a is close to the background object 150b with respect to the distance to the optical system 100o of the LF capture system 100.

Each camera 100o1 to 100o4 captures a scene from a different viewpoint. Thus, the foreground object 150a hides a different portion of the background object 150b for each of the cameras 100o 1-100 o 4. For example, the dashed line 100o1oa defines an area behind the foreground object 150a that is not visible to the first camera 100o 1. The occlusion experienced by the first camera 100o1 corresponds to the intersection of the region in question with the background object 150 b. The same applies to the other cameras 100o2 to 100o4, but with different viewpoints. Finally, the final occlusion is related to the area 200 (the area indicated in dark grey in fig. 2a) corresponding to the intersection of the areas not seen by the cameras 100o1 to 100o4 of the system, respectively. In other words, in the images captured by the cameras 100o1 to 100o4, there are no pixels corresponding to the portions of the background object 150b that belong to the intersection of the background object 150b in question with the area 200.

In this first configuration, the distance from the foreground object 150a to the optical system 100o is kept high relative to the distance between the cameras 100o1 to 100o4 of the optical system 100 o. Thus, the region 200 extends all the way to the background object 150b, so that there is even a final occlusion.

In the second configuration of the scene 150 (fig. 3a), the foreground object 150a is far away from the background object 150b and close to the distance to the optical system 100o of the LF capture system 100. Further, the width of the foreground object 150a is kept lower than the distance between the cameras 100o1 to 100o4 of the optical system 100 o.

Thus, even though the occlusion experienced by the first camera 100o1 (i.e., the intersection of the area bounded by the dashed line 100o1ob with the background object 150 b) is more important than the occlusion in the first configuration described above, there is no more final occlusion. In fact, the region 300 corresponding to the intersection of the regions not seen by each camera 100o 1-100 o4 of the system (the region depicted in dark gray in fig. 3a) no longer intercepts the background object 150 b.

We now describe, in connection with fig. 4, the steps of a method for detecting occlusions in images captured by the LF capture system 100 in accordance with at least one embodiment. The processing associated with those steps is first illustrated by the example discussed with respect to fig. 2 b-2 i, i.e. when the scene 150 is in the first configuration illustrated in fig. 2a above.

In step S400, the depth map 210 (fig. 2b and 2d) and the visibility map 220 (fig. 2c) are computed for pixels in a reference image of the image matrix transmitted by the LF capture system 100 during capture of the scene 150.

For example, the depth map 210 is calculated based on information contained in different images of the image matrix according to what was proposed in the paper "Dataset and Pipeline for Multi-View Light-Field Video" (CVPR' 17) by n.sabater et al. In other variations, based on the available information, other suitable known methods are used to calculate depth map 220 for pixels in the reference image.

In contrast, the visibility map 220 indicates the number of cameras that see a given pixel among the cameras 100o1 through 100o4 in the LF capture system 100. In the present case, the value of the visibility map 220 is between 1 (pixels seen by a single camera 100o1 or 100o2 or 100o3 or 100o 4) and 4(4 is the number of cameras 100o1 to 100o4 in the system). In other embodiments, if the LF capture system consists of n cameras, the visibility map has values between 1 and n.

Therefore, to calculate the visibility map 220, the pixels of the reference image are successively resolved. For each of these pixels, the equivalent pixels (i.e., the same RGB XYZ in view of geometric and photometric calibrations) are searched for in the other images of the image matrix. For each new equivalent pixel found in the other image, a counter will be added for the pixel considered in the reference view. A visibility map 220 is created that indicates the number of images that contain the pixel. In a variant, refinement may be used to derive such a visibility map. For example, the visibility map 220 may be calculated at a sub-sampled resolution or by combining (pool) calculations performed when calculating a disparity map between images. In the same way, the disparity maps computed in parallel can be used to optimize the search area for equivalents in other cameras to focus on the gradient regions of depth. The equivalent search area may also be optimized by considering the values of the depth gradient and the camera baseline. Alternatively, the algorithm proposed by k.wolff et al in the paper "Point Cloud Noise and outer remove for Image-Based 3D Reconstruction", IEEE 3D Vision international conference corpus, 2016, may be considered to derive the visibility graph.

In step S410, a candidate region 240 (fig. 2g, 2h and 2i) is determined based at least on the segmentation of the depth map 210 (fig. 2b and 2d), wherein potential occlusions may occur in the reference image during the capturing of the scene 150.

For example, in step S410a, the depth range covered by the depth map 210 is segmented into a first depth interval 240d1 (fig. 2d) and a second depth interval 240d2 (fig. 2d) having a higher average depth than the first depth interval. In step S410b, a first set of pixels 240p1 (fig. 2d) and a second set of pixels 240p2 corresponding to the first depth interval 240d1 and the second depth interval 240d2, respectively, are determined in the reference image. The candidate region 240 corresponds to a subset of pixels of the first set of pixels 240p1 that are not present in the second set of pixels 240p 2.

For example, the first depth interval 240d1 is selected to contain depth values corresponding to the foreground object 150a, and the second depth interval 240d2 is selected to contain depth values corresponding to the background object 150 b. In that case, the first set of pixels 240p1 is expected to contain pixels representing the foreground object 150a, and the second set of pixels 240p2 is expected to contain pixels representing the background object 150 b. Thus, the pixels in the candidate area 240 defined above have depth values only in the foreground area of the scene 150, and may be suspected to correspond to parts of the image where occlusion may occur, i.e. some parts of the background object 150b may be hidden.

In the example of fig. 2d, where the first depth interval 240d1 and the second depth interval 240d2 do not overlap, pixels in the first set of pixels 240p1 are necessarily absent from the second set of pixels 240p 2. In a variant, the different depth intervals used during the execution of steps S410a and S410b may be consecutive or even overlapping.

The first set of pixels 240p1 and the second set of pixels 240p2 define two sub-graphs of the visibility graph, corresponding to the pixels in the first set of pixels 240p1 (fig. 2f) and the pixels in the second set of pixels 240p2 (fig. 2e), respectively.

In step S420, information representing an occlusion state in the candidate region 240 is determined based on at least visibility values of the visibility map 220 associated with at least two neighborhoods of the candidate region 240.

For example, in step S420a, the visibility values (provided by the visibility map calculated in step S400) of at least two first pixels belonging to the first neighbourhood 250n1 (fig. 2g), respectively, and at least two second pixels belonging to the second neighbourhood 250n2 (fig. 2g) are extrapolated to deliver a first extrapolation curve 250c1 and a second extrapolation curve 250c2 extending towards reduced visibility values in the visibility map, as shown in fig. 2 g. Such extrapolation may be based on, for example, linear (gradient, e.g., derivative) or bilinear interpolation, bicubic interpolation, Lanczos interpolation, etc.

In the considered first configuration of the scene 150, the intersection 250c12 of the extrapolation curves of the first 250c1 and the second 250c2 occurs for negative extrapolated visibilities (case of fig. 2g, where the intersection 250c12 is below the x-axis). However, visibility is defined as being proportional to the number of different images in which the same pixel exists. Thus, a negative extrapolated visibility value may indicate that no pixels are present in other images of the matrix than the reference image that captured a portion of the background object 150b that is behind the foreground object 150 a. In that case, the information defined above is determined to indicate that occlusion occurs in the candidate region 240.

In other cases where the intersection of the first extrapolation curve 250c1 and the second extrapolation curve 250c2 occurs for a positive extrapolated visibility (i.e., where the intersection would be above the x-axis), it may be expected that there is at least one pixel of another image of the matrix with the captured portion of the background object 150b located behind the foreground object 150 a. In this case, the information is determined to indicate that no occlusion occurs in the candidate region 240. In an alternative embodiment, the information is not explicitly determined in this case to simplify the derivation.

Returning to fig. 4, step S420 is performed successively for each row and each column of the reference image. Thus, full information taken from the visibility of all pixels around the candidate region 240 may be used to determine this information.

More specifically, when step S420 is performed for a row or column of the reference image, the first extrapolation curve 250c1 and the second extrapolation curve 250c2 extend in the cutting plane of the visibility map 220 with the row or column in question (fig. 2b to 2g illustrate cross-sectional views of the depth map 210 and the visibility map 220 along such row or column). In other embodiments, any other cut line other than a row or column of the reference image is used to perform step 420.

Returning to fig. 4, in the case where the first extrapolation curve 250c1 and the second extrapolation curve 250c2 intersect each other for negative extrapolated visibility values (i.e., when occlusion is detected after step S420a), the determination of information includes the steps of:

in step S420b, the intersection of the first extrapolation curve 250c1 and the second extrapolation curve 250c2 with the empty (null) visibility axis in the cutting plane of the visibility map 220 is determined for transmitting the first intersection coordinate 250i1 and the second intersection coordinate 250i2 (fig. 2 g); and

in step S420c, the difference between the first intersection coordinate 250i1 and the second intersection coordinate 250i2 is determined.

The magnitude of the difference between the first intersection coordinate 250i1 and the second intersection coordinate 250i2 represents the magnitude of the occlusion. This information also takes into account the magnitude of this difference.

More specifically, such amplitude is such as to establish a link between the rate of reduction of visibility in the neighbourhoods 250n1, 250n2 of the candidate region 240 on the one hand and the width of the candidate region 240 itself on the other hand. The slow reduction in visibility may actually compensate for the candidate area 240 having a high width. In other words, even facing a candidate region 240 with a high width, a slow reduction of the visibility of the pixels in the neighborhood 250n1, 250n2 of the region 240 in question may result in a low magnitude of the difference between the first intersection coordinates 250i1 and the second intersection coordinates 250i 2. In that case, this may indicate that despite the high width of the candidate region 240, the occlusion may remain weak. This occurs, for example, when there is a sufficiently large distance between foreground object 150a and background 150b objects. Thus, the magnitude of the desired difference represents the area that is desired to be invisible, and thus ultimately the occlusion.

When the above-described processing involved in step S420 is performed continuously for each row and each column of the reference image, new information representing the occlusion state in the candidate region 240 is transmitted each time, referred to as intermediate information. Each successive execution of each row and each column of step S420 transfers a set of intermediate information. In that case, when the percentage of the intermediate information representing the occurrence of an occlusion with respect to the total number of intermediate information in the set of intermediate information is greater than a given threshold (e.g., 50%), this information represents the occurrence of an occlusion in the candidate region 240.

So that the processing involved in step S420 is performed successively for each row and each column of the reference image, the magnitude of the difference between the first intersecting coordinates 250i1 and the second intersecting coordinates 250i2 is integrated for each execution of step S420 to deliver an integrated difference. In that case, the information also takes into account the integrated difference.

In other embodiments illustrated, step S420 is not performed continuously for each row and each column of the reference image, but for a single cut line or at least a limited number of cut lines of the reference image, to reduce the computational load of the device 10.

In the embodiment illustrated in fig. 4, the disclosed method is performed during the capture of the scene 150 by the LF capture system 100, such that information is communicated to the user 120 of the LF capture system 100 in step S430.

To this end, in step S430a, the reference image is vividly presented on the display 100d of the LF capture system 100 with a particular outline 260t (fig. 2h) drawn around the candidate region 240. In a variation, the thickness of the special profile 260t, 260b (FIG. 2i) may represent the estimated magnitude of the occlusion, for example, when the information also takes into account the magnitude of the integrated difference (e.g., the thicker the special profile, the larger the estimated magnitude of the occlusion).

In other embodiments not illustrated, the information is communicated to the user by other communication means (e.g., by audible signals, etc.).

In a variant, more than one candidate region is determined in step S410. In that case, step S420 is performed for a different one of the candidate regions determined in step S410. The information transmitted to the user during the execution of step S430 indicates the occlusion state in the corresponding candidate region. For example, a special contour is drawn around each candidate region, the information of which indicates the occurrence of occlusion.

In other embodiments, the disclosed method is performed in post-processing, and the information determined in step S420 is transmitted in step S430, whether to a person performing the post-processing (e.g., by a dedicated alert), whether as an input to another method that relies on such detection, and whether as metadata associated with the LF data.

We now discuss further processing involved in the method of fig. 4, i.e. when the scene 150 is in the second configuration illustrated in fig. 3a, described above, in connection with fig. 3b to 3 d.

More specifically, in this second configuration of the scene 150, the foreground object 150a is far from the background object 150b and close to the optical system 100o of the LF capture system 100. Further, the width of the foreground object 150a remains smaller than the distance between the cameras 100o1 to 100o4 of the optical system 100 o. Thus, as discussed above, there is no longer a final occlusion.

Thus, the visibility map 320 (fig. 3b) calculated for the reference image during step S400 shows a smoother variation of pixels around the central portion (i.e. around the portion corresponding to the foreground object 150 a) than in the visibility map 220 calculated in the first configuration (fig. 2 c).

A depth map 310 (fig. 3c) is also computed during step S400. Since the foreground object 150a is further away from the background object 150b in the second configuration of the scene 150 than in the first configuration, the depth range covered by the depth map 310 is wider than in the depth map 210.

The execution of step S410 (especially steps S410a and S410b) transfers the first set of pixels 340p1 and the second set of pixels 340p2 (fig. 3c) corresponding to the first depth interval 340d1 and the second depth interval 340d2, respectively, having an average depth higher than the first depth interval. In this example, the first depth interval 340d1 is selected to contain depth values corresponding to the foreground object 150a, and the second depth interval 340d2 is selected to contain depth values corresponding to the background object 150 b. The candidate region 340 is determined to be a subset of the pixels of the first set of pixels 340p1 that are not present in the second set of pixels 340p 2. However, in this example, the first depth interval 340d1 and the second depth interval 340d2 do not overlap. Thus, pixels in the first set of pixels 340p1 are necessarily absent from the second set of pixels 340p 2.

During step S420 (in particular step S420a), the visibility values of the at least two first pixels belonging respectively to the first neighbourhood 350n1 of the candidate region 340 and the at least two second pixels belonging respectively to the second neighbourhood 350n2 of the candidate region 340 are extrapolated (e.g. based on linear (gradient, such as derivative) or bilinear interpolation, bicubic interpolation, Lanczos interpolation, etc.) to deliver respectively a first extrapolation curve 350c1 and a second extrapolation curve 250c2, as illustrated in fig. 3 d.

However, in contrast to what occurs in the first configuration of the scene 150, in the current second configuration, the intersection 350c12 of the first extrapolation curve 350c1 and the second extrapolation curve 250c2 occurs for positive extrapolated visibility. Thus, this information indicates that no occlusion has occurred in the candidate region 340. For example, when the disclosed method is performed during the capture of the scene 150 by the LF capture system 100, in this case, no specific contours are displayed around the candidate area 340 on the display 100d of the LF capture system 100.

Fig. 5 illustrates a block diagram of a particular embodiment of a device 110 (see fig. 1) that may be used to implement a method of notifying a user of a light field capture system of potential occlusions during capture of a scene according to the disclosure (in accordance with any of the embodiments disclosed above).

In this embodiment, the device 110 for implementing the disclosed methods includes non-volatile memory 503 (e.g., Read Only Memory (ROM) or a hard disk), volatile memory 501 (e.g., random access memory or RAM), and a processor 502. The non-volatile memory 503 is a non-transitory computer readable carrier medium. It stores executable program code instructions that are executed by the processor 502 to enable the above described methods (methods for notifying a user of an LF capture system of potential occlusions during capture of a scene according to the present disclosure) to be implemented in its various embodiments disclosed above in connection with fig. 4.

At initialization, the program code instructions described above are transferred from the non-volatile memory 503 to the volatile memory 501 for execution by the processor 502. The volatile memory 501 also includes registers for storing variables and parameters required for this execution.

All the steps of the above-described method for notifying a user of a light field capture system of potential occlusions during capture of a scene according to the present disclosure may equally well be implemented:

by executing a set of program code instructions executed by a reprogrammable computing machine, such as a PC-type device, a DSP (digital signal processor) or a microcontroller. The program code instructions may be stored on a removable (e.g., floppy disk, CD-ROM or DVD-ROM) or non-removable non-transitory computer readable carrier medium; or

By a special-purpose machine or component, such as an FPGA (field programmable gate array), an ASIC (application-specific integrated circuit), or any special-purpose hardware component.

In other words, the present disclosure is not limited to a purely software-based implementation in the form of computer program instructions, but may also be implemented in the form of hardware or any form of combination of hardware and software parts.

According to one embodiment, a method is proposed for detecting occlusions in an image captured by a light field capture system, comprising, for at least one reference image belonging to a matrix of images captured by the light field capture system:

-computing a depth map and a visibility map of pixels in the reference image;

-determining at least one candidate region in the reference image where potential occlusion may occur based at least on the segmentation of the depth map;

Accordingly, the present disclosure proposes a new and inventive solution that utilizes information available in different views of LF data to detect occlusions in images captured by LF capture systems (e.g., camera equipment or arrays, plenoptic cameras, etc.).

Furthermore, the determination of the information representing the occlusion state depends only on easily derivable parameters (e.g. depth and visibility) with little additional derivation, so that the method can be easily performed. For example, the method may be used in a real-time environment to notify a user of the LF capture system during capture of the image in question. In that case, the user can change the location and/or orientation of the LF capture system, thereby avoiding the presence of occlusions (if any).

According to one embodiment, an apparatus is presented for detecting occlusions in images captured by a light field capture system, comprising a processor or a dedicated machine configured to, for at least one reference image belonging to a matrix of images captured by the light field capture system:

-computing a depth map and a visibility map of pixels in the reference image;

-determining at least one candidate region in the reference image where potential occlusion may occur based at least on the segmentation of the depth map;

According to one embodiment, determining at least one candidate region comprises:

-segmenting a depth range covered by the depth map to deliver at least a first depth interval and a second depth interval having a higher average depth than the first depth interval;

-determining a first set of pixels and a second set of pixels in the reference image corresponding to the first depth interval and the second depth interval, respectively. The at least one candidate region is associated with a subset of pixels of the first set of pixels that are not present in the second set of pixels.

Thus, the first set of pixels corresponds to, for example, a portion of the reference image that is in the foreground, and the second set of pixels corresponds to a portion of the reference image that is in the background (in the particular case where the first and second depth intervals do not overlap, there must not be pixels in the first set of pixels in the second set of pixels). It may be suspected that only pixels with depth values in the foreground correspond to parts of the image where occlusion may occur (i.e. some objects in the background may be hidden).

According to one embodiment, determining the information comprises extrapolating visibility values of at least two first pixels belonging to a first neighbourhood of the at least one candidate region and at least two second pixels belonging to a second neighbourhood of the at least one candidate region to convey a first and a second extrapolation curve extending towards reduced visibility values in the visibility map. The information indicates that occlusion occurs in the at least one candidate area when the first extrapolation curve and the second extrapolation curve intersect each other for negative extrapolated visibility values.

Such extrapolation may be based, for example, on linear (gradient, e.g., derivative) or bilinear interpolation, bicubic interpolation, Lanczos interpolation, and the like.

More specifically, when the intersection of the first extrapolation curve and the second extrapolation curve occurs for a negative extrapolated visibility, it can be expected that there are no other pixels in the other images of the matrix (since visibility is defined as being proportional to the number of different images in which the same pixel is present) except for the reference image of the captured portion of the background located behind the foreground defined by the first depth interval.

In contrast, when the intersection of the first extrapolation curve and the second extrapolation curve occurs for a positive extrapolated visibility, at least one pixel of another image of the matrix of the captured portion of the background located behind the foreground may be expected to be present.

According to one embodiment, the determining information is performed at least for a row or column of the reference image, the first extrapolation curve and the second extrapolation curve extending in the cutting plane of the visibility map following the row or column. Determining information includes, when the first extrapolation curve and the second extrapolation curve intersect each other for negative extrapolated visibility values:

-determining the intersection of the first and second extrapolation curves with the visibility axis of the void in the cutting plane conveying the first and second intersection coordinates; and

-determining a difference between the first intersecting coordinates and the second intersecting coordinates.

The magnitude of the difference represents the magnitude of the occlusion. This information also takes into account the magnitude of the difference.

Thus, the magnitude of the difference is expected to represent an area of undesirable visibility, and thus the magnitude of occlusion.

According to one embodiment, the determination of information, referred to as intermediate information, is performed continuously for each row and column of the reference image which each time conveys new information representative of the occlusion state in said at least one candidate region. Each successive execution of rows and columns will convey a set of intermediate information.

Thus, all information derived from the visibility of all pixels around the candidate region can be used to determine this information.

According to one embodiment, the information representative of the occurrence of an occlusion in the at least one candidate area is present when the percentage of intermediate information representative of the occurrence of an occlusion with respect to the total number of intermediate information in the set of intermediate information is greater than a given threshold.

According to one embodiment, for each execution of the determining information action, the magnitude of the difference is integrated to convey an integrated difference. This information also takes into account the integrated difference.

According to one embodiment, the method further comprises, or the device is further configured for transmitting information to a user of the light field capture system during capture of the reference image.

According to one embodiment, the act of communicating information comprises: presenting the reference image on a display of the light field capture system with a particular contour drawn around the at least one candidate region.

According to one embodiment, the thickness of the specific profile is a function of the integrated difference.

According to an embodiment, a non-transitory computer-readable medium is proposed, comprising a computer program product recorded thereon and executable by a processor, comprising program code instructions including program code instructions for implementing the method of detecting occlusions in images captured by a light field capture system as described above.

20页详细技术资料下载

Method for detecting occlusion in an image, corresponding device and computer program product

相关技术

网友询问留言