Element localization in space

文档序号：1866325 发布日期：2021-11-19 浏览：20次中文

阅读说明：本技术 空间中的元素定位 (Element localization in space ) 是由约阿希姆·凯纳特托尔斯滕·沃尔夫于 2019-01-28 设计创作，主要内容包括：提供一种用于定位的定义的技术(例如,方法、系统……),例如深度图估计。例如,一种用于在包含至少一个确定物体(91、101、111、141)的空间中定位与空间的确定2D图像中的特定2D表示元素(X-(L))相关联的物体元素(93、143、X),可以包括：基于预定义的位置关系(381a'、381b')推导(351)成像物体元素(93、143、X)的候选空间位置的范围或区间(95、105、114b、115、135、145、361')；将候选空间位置的范围或区间限制为(35、36、352)可接受的候选空间位置的至少一个受限范围或区间(93a、93a'、103a、112'、137、147、362'),其中限制包括以下至少一项：使用围绕至少一个确定物体(91、101、……)的至少一个包含体积(96、86、106、376、426a、426b)来限制候选空间位置的范围或区间；以及使用围绕不可接受的候选空间位置的至少一个排除体积(279、299a-299e、375)来限制候选空间位置的范围或区间；以及基于相似性度量,在受限范围或区间的可接受的候选空间位置中检索(37、353)最合适的候选空间位置(363')。(A defined technique (e.g., method, system … …) for localization, such as depth map estimation, is provided. For example, a specific 2D representation element (X) in a determined 2D image for localization and localization in a space containing at least one determined object (91, 101, 111, 141) L ) Associated object elements (93, 143, X) may include: deriving (351) a range or interval (95, 105, 114b, 115, 135, 145, 361') of candidate spatial positions of the imaged object element (93, 143, X) based on the predefined positional relationship (381a ', 381b '); limiting the range or interval of candidate spatial locations to (35, 36, 352) at least one limited range or interval (93a, 93a ', 103a, 112', 137, 147, 362') of acceptable candidate spatial locations, wherein the limiting comprises at least one of: limiting a range or interval of candidate spatial positions using at least one containing volume (96, 86, 106, 376, 426a, 426b) surrounding at least one determined object (91, 101, … …); and use of the enclosureAt least one excluded volume (279, 299a-299e, 375) of unacceptable candidate spatial locations to limit a range or interval of candidate spatial locations; and retrieving (37, 353) the most suitable candidate spatial position (363') among the acceptable candidate spatial positions of the limited range or interval based on the similarity measure.)

1. A method for locating a specific 2D representation element (X) in a determined 2D image of a space containing at least one determined object (91, 101, 111, 141) associated with said space_L) A method (30, 350) of an associated object element (93, 143, X), the method comprising:

deriving (351) a range or interval (95, 105, 114b, 115, 135, 145, 361 ') of candidate spatial positions of the imaged object element (93, 143, X) based on the predefined positional relationship (381a ', 381b ');

limiting (35, 36, 352) the range or interval of candidate spatial locations to at least one limited range or interval (93a, 93a ', 103a, 112 ', 137, 147, 362 ') of acceptable candidate spatial locations, wherein limiting comprises at least one of:

Limiting a range or interval of the candidate spatial locations using at least one containing volume (96, 86, 106, 376, 426a, 426b) surrounding at least one determined object (91, 101); and

limiting a range or interval of the candidate spatial location using at least one excluded volume (279, 299a-299e, 375) surrounding the unacceptable candidate spatial location; and

based on the similarity measure, a most suitable candidate spatial location (363') is retrieved (37, 353) among the acceptable candidate spatial locations of the limited range or interval.

2. The method according to claim 1, wherein at least one first determined object (91) and one second determined object (101, 111) are contained in said space,

wherein limiting (35, 36, 352) comprises limiting the range or interval (95, 115) of candidate spatial locations to:

at least one first limited range or interval (93a', 96a) of acceptable candidate spatial positions associated with the first determined object (91); and

at least one second limited range or interval (103a, 112') of acceptable candidate spatial positions associated with said second determined object (101, 111),

wherein limiting (352) comprises defining the at least one containing volume (96) as a first containing volume around the first determining object (91) and/or as a second containing volume (106) around the second determining object (101) to limit the at least one first and/or second range or interval of candidate spatial positions to at least one first and/or second limited range or interval of acceptable candidate spatial positions; and

Wherein retrieving (353) comprises determining whether the particular 2D representation element (93) is associated with the first determined object (91) or the second determined object (101, 111).

3. The method according to claim 2, wherein the particular 2D representation element (X) is determined_L) Whether to associate with the first determination object (91) or the second determination object (101) is performed based on a similarity measure.

4. The method according to claim 2 or 3, wherein determining whether the particular 2D representation element is associated with the first determined object (91) or the second determined object (101) is performed based on the following observations:

one of the at least one first and second limited ranges or intervals of acceptable candidate spatial locations is null; and

the other (93a) of the at least one first and second limited ranges or intervals of acceptable candidate spatial positions is not empty, thereby determining that the particular 2D representation element (93) is within the other (93a) of the at least one first and second limited ranges or intervals of acceptable candidate spatial positions.

5. The method according to any one of claims 2-4, wherein restricting (352) comprises using information from a second camera (114) or 2D image to determine whether the particular 2D representation element (93) is associated with the first determined object (91) or the second determined object (101, 111).

6. The method of claim 5, wherein the information from the second camera (114) or the 2D image comprises previously obtained positions of the object element (93) comprised in:

-said at least one first limited range or interval (96a) of acceptable candidate spatial positions, thereby concluding that said object element (93) is associated with said first object (91); or

-said at least one second limited range or interval (112') of acceptable candidate spatial positions, thereby concluding that said object element (93) is associated with said second object (111).

7. A method (350, 450) comprising:

as a first operation (451), obtaining a position parameter and at least one containing volume (424a, 424b) associated with a second camera position (424 b);

as a second operation (452), a method (350) according to any of the preceding claims is performed for a particular 2D representation element of a first 2D image acquired at a first camera position, the method comprising:

Analyzing (457a, 457b) whether the following two conditions are satisfied based on the location parameters obtained at the first operation (451):

(457b) at least one candidate spatial position (425D ") will occlude at least one containing volume (426a) in a second 2D image obtained or obtainable at the second camera position, an

(457a) At least one candidate spatial position (425D ') is not occluded (457 a') by at least one containing volume (426a) in the second 2D image,

therefore, if the two conditions (457a ", 457 b') are satisfied:

refraining (457 b', 457c) from performing a search (353, 458) even if at least one candidate spatial position (425D ") is within a limited range of acceptable candidate spatial positions for the first 2D image, and/or

Excluding (457 b', 457c) the at least one candidate spatial position (425D ") from a restricted range or interval of acceptable candidate spatial positions for the first 2D image, even if the at least one candidate spatial position (425D") is within the restricted range of acceptable candidate spatial positions.

8. A method (350, 450) comprising:

as a first operation (451), obtaining a position parameter and at least one containing volume (424a, 424b) associated with a second camera position (424 b);

As a second operation (452), the method (350) according to any one of the preceding claims is performed for a particular 2D representation element of the first 2D image acquired at the first camera position, the method comprising:

analyzing (457a), based on the position parameters obtained at the first operation (451), whether at least one acceptable candidate spatial position (425e "') of the restricted range would be occluded by the at least one containing volume (426b) in the second 2D image obtained or obtainable at the second camera position, so as to keep the acceptable candidate spatial position (425 e"') in the restricted range.

9. A method (430) comprising

As a first operation (431), positioning a plurality of 2D representation elements for the second 2D image,

performing, as a second subsequent operation (432), the deriving, limiting and retrieving of the method (350) according to any one of the preceding claims for determining a most suitable candidate spatial position for a determined 2D representation element ((x0, y0)) of a first determined 2D image, wherein the second 2D image and the first determined 2D image are acquired at spatial positions having a predetermined positional relationship,

wherein the second operation (432) further comprises finding (435), in the second 2D image previously processed in the first operation (431), a 2D representation element ((x ', y '))) corresponding to a candidate spatial position (A ') of a first determined 2D representation element ((x0, y0)) of the first determined 2D image,

So as to further limit (436) the range or interval of acceptable candidate spatial positions and/or to obtain a similarity measure with respect to the first determined 2D representation element ((x0, y0)) in the second operation (432).

10. The method of claim 9, wherein the second operation (432) is such that, upon observing that a localization position of a 2D representation element ((x ', y ')) in the second 2D image previously obtained is to be occluded by the candidate spatial position (Α ') of the first determined 2D representation element ((x0, y0)) considered in the second operation with respect to the second 2D image:

further limiting (436) a limited range or interval of acceptable candidate spatial positions, so as to exclude said candidate spatial position (a') of a determined 2D representation element ((x0, y0)) of said first determined 2D image from the limited range or interval of acceptable candidate spatial positions.

11. The method according to claim 9 or 10, wherein, when it is observed that the localized position (93) of the 2D representation element ((x ', y')) in the second 2D image corresponds to the first determined 2D representation element ((x0, y 0)):

for a determined 2D representation element ((x0, y0)) of the first determined 2D image, limiting a range or interval of acceptable candidate spatial positions so as to exclude positions further than the localized position (93) from the limited range or interval of acceptable candidate spatial positions (96a, 112').

12. The method of claim 9 or 10 or 11, further comprising, when it is observed that the localization position (93) of the 2D representation element ((x ', y ')) in the second 2D image does not correspond to the localization position (112') of the first determined 2D representation element ((x0, y 0)):

invalidating a most suitable candidate spatial position (93) of a determined 2D representation element ((x0, y0)) of the first determined 2D image obtained in the second operation.

13. The method according to any one of claims 9-12, wherein the localized position (93) corresponds to the first determined 2D representation element ((x0, y0)) when a distance of a localized position (93) of the 2D representation element ((x ', y')) in the second 2D image to one of the candidate spatial positions of the first determined 2D representation element ((x0, y0)) is within a maximum predetermined tolerance distance.

14. The method according to any of claims 9-13, further comprising, when a 2D representation element ((x ', y')) is found in the second 2D image, analyzing a confidence value or reliability value of the positioning of the first 2D representation element ((x ', y')) in the second 2D image and using it only if the confidence value or reliability value is above a predetermined confidence threshold or unreliability value is below a predetermined threshold.

15. The method of claim 14, wherein the confidence value is based at least in part on a distance between the localized position and the camera position, and the confidence value increases as the distance is closer.

16. The method of claim 15, wherein the confidence value is based at least in part on the number of objects (91, 111) or restricted ranges containing volumes (96) or acceptable spatial positions (96b) such that the confidence value is increased if a smaller number of objects (91, 111) or restricted ranges (96b) containing volumes (96) or acceptable spatial positions are found in the range or interval of acceptable spatial candidate positions.

17. The method of any of the preceding claims, wherein limiting (352) comprises defining at least one approximation surface (92, 132, 142) so as to limit at least one range or interval of candidate spatial positions to at least one limited range or interval of acceptable candidate spatial positions.

18. The method of claim 17, wherein defining comprises defining at least one approximation surface (142) and one tolerance interval (147) to limit at least one range or interval of candidate spatial positions to a limited range or interval of candidate spatial positions defined by the tolerance interval, wherein the tolerance interval has:

A distal end (143 "') defined by the at least one approximation surface; and

a proximal end (147') defined based on the tolerance interval; and

based on the similarity measure, the most suitable candidate spatial location is retrieved among the acceptable candidate spatial locations of the limited range or interval (143).

19. A method for positioning an object element (143) associated with a particular 2D representation element in a 2D image of a space containing at least one determined object (141), the method comprising:

deriving a range or interval (145) of candidate spatial positions of the imaging object element (143) based on the predefined positional relationship;

limiting the range or interval of candidate spatial locations to at least one limited range or interval of acceptable candidate spatial locations (147), wherein limiting comprises:

defining at least one approximation surface (142) and a tolerance interval (147) for limiting at least one range or interval of candidate spatial positions to a limited range or interval of candidate spatial positions defined by the tolerance interval, wherein the tolerance interval has:

a distal end (143 "') defined by the at least one approximation surface; and

a proximal end (147') defined based on the tolerance interval; and

Based on the similarity measure, the most suitable candidate spatial location is retrieved among the acceptable candidate spatial locations of the limited range or interval (143).

20. The method according to claim 18 or 19, iterated by using an increased tolerance interval in order to increase the probability of containing the object element (143).

21. The method of claim 18 or 19, iterated by using a reduced tolerance interval in order to reduce the probability of containing different object elements.

22. The method according to any one of claims 18-21, wherein limiting comprises defining a tolerance value (t) for defining the tolerance interval (147)₀)。

23. The method of any of claims 18-22, wherein limiting comprises basing the limiting onDefining a tolerance interval value Δ obtained from said at least one approximation surface (142), whereinIs the interval (145) of the approximate surface (142) at the candidate spatial positionA normal vector at a point (143' ") where the approximation surfaces (142) intersect, and a vectorDefining the optical axis of the determined camera or 2D image.

24. The method of claim 23, wherein at least a portion of the tolerance interval value Δ is defined from the at least one approximation surface (142) based on the following formula

Wherein₀Is a predetermined tolerance value, whereinIs a normal vector of the approximation surface (142) at a point (143' ") where an interval (145) of candidate spatial positions intersects the approximation surface (142), wherein the vectorDefines the optical axis of the camera in question and limits the amplitudeAndthe angle therebetween.

25. The method of any of claims 24, wherein retrieving comprises:

considering the range or interval (1) of the approximate surface (142) between the approximate surface (142) and the candidate spatial position45) Normal vector at the intersection point (143') between

Based on the similarity measure, retrieving the most suitable candidate spatial position (143) among the acceptable candidate spatial positions of the limited range or interval (149), wherein retrieving comprises retrieving based on relating to the normal vectorIn an acceptable candidate spatial position of a limited range or interval (149) and based on the normal vectorTo retrieve the most suitable candidate spatial location.

26. A method for positioning, in a space containing at least one determined object, an object element (143) associated with a particular 2D representation element in a determined 2D image of the space, the method comprising:

deriving a range or interval (145) of candidate spatial positions of the imaging object element (143) based on the predefined positional relationship;

Limiting (149) the range or interval of candidate spatial locations to at least one limited range or interval of acceptable candidate spatial locations, wherein limiting comprises defining at least one approximation surface (142) so as to limit (145) the at least one range or interval of candidate spatial locations to the limited range or interval of candidate spatial locations (149);

considering a normal vector of the approximation surface (142) at an intersection between the approximation surface (142) and a range or interval (145) of candidate spatial positions

Based on the similarity measure, the most suitable candidate spatial location (14) is retrieved among the acceptable candidate spatial locations of the limited range or interval (149)3) Wherein retrieving comprises based on reference to the normal vectorIn an acceptable candidate spatial position of a limited range or interval (149) and based on a normal vectorTo retrieve the most suitable candidate spatial location.

27. The method of claim 25 or 26, wherein

Retrieving comprises processing for the particular 2D representation element ((X0, y0), X_L) Is determined by a similarity measure (c) of at least one candidate spatial position (d)_sum)，

Wherein the processing involves other 2D representation elements ((x, y)) within a particular neighborhood (N (x0, y0)) of the particular 2D representation element ((x0, y0)),

Wherein the processing comprises for each of said other 2D representation elements ((x, y)) assuming a planar surface of the object in the object element, according to the vectorObtaining a vector from a plurality of vectors within a defined predetermined rangeTo derive a sum vectorAn associated candidate spatial position (D), wherein the candidate spatial position (D) is used to determine each of the 2D representation elements ((x, y)) pair similarity measures (c) in a neighborhood (N (x0, y0))_sum) The contribution of (c).

28. The method according to any of claims 25-27, wherein retrieving is based on the following types of relationships:

wherein (x)₀,y₀) Is the particular 2D representation element, (x, y) is (x)₀,y₀) K is an intra-camera matrix, d is a depth candidate representing a candidate spatial position,is based on said specific 2D representation element (x) under the assumption of a planar surface of the object in the object element₀,y₀) To compute a function of the depth candidate for the particular 2D representation element (x, y).

29. The method of claim 28, wherein retrieving is based on evaluating a similarity measure c of the type_sum(d)

WhereinRepresenting a sum or a general aggregation function.

30. The method of any of claims 25-29, wherein limiting comprises limiting relative to the normal vector Is obtained a vector perpendicular to the at least one determined object (141) at an intersection (143) of the at least one determined object (141) with the range or interval (145) of candidate spatial positions

31. The method according to any one of claims 25-30, wherein limiting comprises obtaining a vector perpendicular to the at least one determined object (143) according to

Where θ is with respect to the normal vectorAngle of inclination of theta_maxIs a predefined maximum tilt angle, phi is an azimuth angle, and whereinIs parallel to the third axis (z)And the other two axes (x, y) andorthogonal coordinate systems are explained.

32. The method of any one of claims 25-31, wherein normal vectors are provided for different restricted ranges (96a, 112') of acceptable candidate spatial locations associated with the same range (115) of candidate spatial locationsIs different.

33. The method of any one of claims 17-32, wherein only at an intersection (143 "") between the approximation surface (142) and a range or interval (145) of candidate spatial positionsApproximating a normal vector of a surface (122a, 142)With the restriction applied when having a predetermined direction within a certain range of directions.

34. The method of claim 33, wherein the specific range of directions is calculated based on a direction of a range or interval (145) of candidate spatial positions relevant for determining the 2D image.

35. The method of claim 34, wherein the normal vector of the approximation surface (122a,142) is only present at an intersection (143 "") between the approximation surface (142) and the range or interval (145) of candidate spatial positionsA constraint is applied when the dot product with the vector describing the path from the camera of the candidate spatial position has a predefined sign.

36. The method according to any one of claims 14-35, wherein limiting (352) comprises defining at least one approximation surface-defined extremity (93 "') of at least one limited range or interval (93a) of acceptable candidate spatial positions, wherein the at least one approximation surface-defined extremity (93"') is located at an intersection between an approximation surface (92) and a range or interval (95) of candidate spatial positions.

37. The method of claims 17-36, wherein the approximation surface (92) is selected by a user.

38. The method according to any one of claims 17-37, wherein limiting (352) includes scanning along a range (135) of candidate positions from a proximal position toward a distal position (135), and ending when the at least one limited range or interval (137) of acceptable candidate spatial positions is observed to have a distal end (I7) associated with the approximation surface (132).

39. The method according to any one of claims 17-38, wherein at least one containment volume (96, 136d, 206) is automatically defined in accordance with the at least one approximation surface (92, 132, 202).

40. The method according to any one of claims 17-39, wherein at least one containing volume (186) is defined from the at least one approximation surface (182) by scaling the at least one approximation surface (182).

41. The method according to any one of claims 17-40, wherein at least one containing volume (186) is defined from the at least one approximation surface (182) by scaling the at least one approximation surface from a scaling center (182a) of the at least one approximation surface (182).

42. The method according to any one of claims 17-41, wherein a constraint relates to at least one containing volume or approximation surface formed by a structure (200, 206), said structure (200, 206) consisting of vertices or control points, edges and surface elements, wherein each edge connects two vertices and each surface element is surrounded by at least three edges, and there is a connection path of an edge from each vertex to any other vertex of the structure.

43. The method of claim 42, wherein each edge is connected to an even number of surface elements.

44. The method of claim 43, wherein each edge is connected to two surface elements.

45. The method according to any one of claims 42-44, wherein the structure (200, 206) occupies an enclosed volume without boundaries.

46. The method according to any one of claims 17-45, wherein at least one containment volume (96, 206) is formed by a geometric structure (200), the method further comprising defining the at least one containment volume (96, 206) by:

moving an element (200a-200i) by decomposing the element (200a-200i) along a normal to the element (200a-200 i); and

the elements (200b, 200c) are reconnected by generating additional elements (210bc, 210 cb).

47. The method of claim 46, further comprising:

-inserting at least one new control point (220) within the decomposed area (200');

the at least one new control point (220) is reconnected with the decomposed element (210bc) to form a further element (220 bc).

48. The method according to any one of claims 42-47, wherein the elements (200a-200b) are triangle elements.

49. The method according to any one of claims 17-48, wherein limiting (352) includes:

The limited range or interval (137) is searched from the proximal position to the distal position within a range or interval (135) of candidate spatial positions, and the search is ended when an approximation surface is retrieved.

50. The method according to any one of claims 17-49, wherein said at least one approximation surface (92, 132, 142) is comprised within said at least one object (91, 141).

51. The method according to any one of claims 17-50, wherein the at least one approximation surface (92, 132, 142) is a rough approximation of the at least one object.

52. A method according to any of claims 17-51, further comprising defining a limited range or interval of acceptable candidate spatial positions as the range or interval of candidate spatial positions obtained during derivation when it is observed that the range or interval of candidate spatial positions obtained during derivation does not intersect any approximation surfaces.

53. The method according to any one of the preceding claims, wherein the at least one contained volume is a rough approximation of the at least one object.

54. The method according to any of the preceding claims, wherein a random subset in a restricted range or interval applied to acceptable candidate positions and/or acceptable normal vectors is retrieved.

55. The method of any of the preceding claims, wherein limiting (352) comprises defining at least one volume-defined terminus (96 "'; I0, I1, I2, I3, I4, I5, I6) of at least one limited range or interval (137) of acceptable candidate spatial locations.

56. The method according to any one of the preceding claims, wherein the containment volume (96, 106) is defined by a user.

57. The method according to any one of the preceding claims, wherein retrieving (37, 353) comprises determining whether the particular 2D representation element is associated with at least one determined object (91, 101) based on a similarity measure.

58. The method according to any one of the preceding claims, wherein at least one of the 2D representation elements is a pixel (X) in the determined 2D image (22)_L)。

59. The method according to any of the preceding claims, wherein the object element (93, 143) is a surface element of the at least one determined object (91, 141).

60. The method according to any of the preceding claims, wherein the range or interval (95, 105, 114b, 115, 135, 145) of candidate spatial positions of the imaged object elements is spread out in a depth direction with respect to the determined 2D representation element.

61. The method according to any of the preceding claims, wherein a range or interval of candidate spatial positions of the imaging object element (93, 143) is unwrapped along a ray (95, 105, 114b, 115, 135, 145) emerging from a node of a camera with respect to the determined 2D representation element.

62. The method according to any one of the preceding claims, wherein retrieving (353) comprises measuring a similarity measure along acceptable candidate spatial positions of a limited range or interval (93 a; 93, 103; 93 a; 96a, 112'; 137) obtained from another 2D image (23) of the space and having a predefined positional relationship with the determined 2D image (22).

63. The method of claim 62, wherein retrieving comprises measuring a similarity measure along 2D representation elements in the further 2D image forming an epipolar line (21) associated with the at least one restricted range (45).

64. The method of any of the preceding claims, wherein limiting (35, 36, 352) comprises finding an intersection (I3-I9, I3'-I9') between at least one of the inclusion volumes (136a-136f), the exclusion volumes (299a-299e, 379), and/or the approximation surfaces (132, 172) and a range or interval (135, 295, 375) of candidate locations.

65. The method of any of the preceding claims, wherein limiting comprises finding an end (I3-I9, I3 '-I9') of a limited range or interval (135, 295, 375) of candidate locations using at least one of an inclusion volume (136a-136f), an exclusion volume (299a-299e, 379), and/or an approximation surface (132, 172).

66. The method of any preceding claim, wherein limiting comprises:

a range or interval (137) is searched from a proximal position toward a distal position within a range or interval (135) of candidate spatial positions.

67. The method according to any one of the preceding claims, wherein defining (351) comprises:

selecting a first 2D image (335) of the space and a second 2D image (335 ') of the space, wherein the first 2D image (335) and the second 2D image (335') are acquired at camera positions having a predetermined positional relationship to each other;

displaying at least the first 2D image (335),

guiding a user to select a control point in the first 2D image (335), wherein the selected control point (330) is a control point (210) forming an element (200a-200i) of an approximation surface or an exclusion volume or a structure (200) containing a volume;

guiding a user to selectively translate a selected control point (330) in the first 2D image (335) while limiting movement of the control point (330) along an epipolar line (331) in the second 2D image (335 ') associated with a second 2D image control point (330'), wherein the second 2D image control point (330') corresponds to the same control point (210, 330) of the elements (200a-200i) in the first 2D image,

So as to define the movement of said elements (200a-200i) of the structure (200) in 3D space.

68. The method according to any one of the preceding claims, wherein defining (351) comprises:

selecting a first 2D image (345) of the space and a second 2D image (345') of the space, wherein the first 2D image and the second 2D image are acquired at camera positions having a predetermined positional relationship to each other;

displaying at least the first 2D image (345),

guiding a user to select a control point (340) in the first 2D image (345), wherein the selected control point is a control point (210) forming an element (200a-200i) of an approximation surface or an exclusion volume or a structure (200) containing a volume;

obtaining a selection from a user associated with a new location (341) of the control point (340) in the first 2D image (345);

-restricting the new position of the control point (340) in the space to a position on an epipolar line (342) in the second 2D image (345') associated with a new position (341) of the control point in the first 2D image (345), and-determining a new position (341 ') in the space as a closest position (341 ') on the epipolar line (342) to an initial position (340 ') in the second 2D image (345'),

Wherein a point (340') corresponds to the same control point (210, 340) of the elements (200a-200i) of the structure (200) of the selected control point (340),

so as to define the movement of said elements (200a-200i) of said structure (200).

69. A method for locating, in a space containing at least one determined object, an object element associated with a particular 2D representation element in a determined 2D image of the space, the method comprising:

obtaining a spatial position of an imaging object element;

obtaining a reliability value or an unreliability value for the spatial position of the imaged object element;

in case the reliability value does not comply with a predefined minimum reliability or the unreliability does not comply with a predefined maximum unreliability, performing the method of any of the preceding claims to refine a previously obtained spatial position.

70. A method for refining, in a space including at least one determined object, a positioning of a previously obtained object element associated with a particular 2D representation element in a determined 2D image of the space, the method comprising:

graphically displaying the determined 2D image of the space;

guiding a user to define at least one containing volume and/or at least one approximating surface;

A method according to any one of the preceding claims performed to refine a previously obtained position fix.

71. The method of any of the preceding claims, further comprising, after defining at least one inclusion volume (376) or approximation surface, automatically defining an exclusion volume (379) between the at least one inclusion volume (376) or approximation surface and a position of at least one camera.

72. The method according to any of the preceding claims, further comprising, upon defining a first proximal containing volume (426b) or approximating surface and a second distal containing volume or approximating surface (422, 426a), automatically defining:

a first excluding volume (C) between the first containing volume (426b) or the approximation surface and the position of the at least one camera (424 b);

at least one second excluding volume (A, B) between a second containing volume (426a) or approximation surface (422) and a position of at least one camera (424b), wherein a non-excluding region (D) between the first excluding volume (C) and the second excluding volume (A, B) is excluded.

73. The method of any one of the preceding claims, applied to a multi-camera system.

74. The method according to any one of the preceding claims, applied to a stereoscopic imaging system.

75. A method according to any preceding claim, wherein retrieving comprises selecting, for each candidate spatial position of the range or interval, whether the candidate spatial position will be part of a restricted range of acceptable candidate spatial positions.

76. A system (360, 380) for locating, in a space containing at least one determined object (91, 101, 111, 141), an object element (93, 143) associated with a particular 2D representation element in a determined 2D image of the space, the system comprising:

a derivation block (361) for deriving a range or interval (95, 105, 114b, 115, 135, 145) of candidate spatial positions of the imaging object element (93, 143) based on a predefined positional relationship;

a limiting block (35, 36, 352) for limiting a range or interval of candidate spatial positions to at least one limited range or interval (93a, 93a ', 103a, 112', 137, 147) of acceptable candidate spatial positions, wherein the limiting block is configured to:

limiting a range or interval of candidate spatial locations using at least one containing volume (96, 86, 106) surrounding at least one determined object; and/or

Limiting a range or interval of candidate spatial locations using at least one excluded volume (279, 299a-299e, 375) that includes unacceptable candidate spatial locations; and

A retrieval block (37, 353) for retrieving a most suitable candidate spatial position among the acceptable candidate spatial positions of the limited range or interval based on the similarity measure.

77. The system according to claim 76, further comprising a first camera and a second camera for acquiring 2D images in a predetermined positional relationship.

78. The system of claim 76 or 77, further comprising: at least one movable camera for acquiring 2D images from different positions and in a predetermined positional relationship.

79. The system of any of claims 76-78, further comprising: a constraint definer (364) for rendering or displaying the at least one 2D image to obtain an input (384a) for defining the at least one constraint.

80. The system of any of claims 76-79, further configured to perform the method of any of claims 1-76.

81. A non-transitory storage unit comprising instructions that, when executed by a processor, cause the processor to perform the method of any one of claims 1-75.

Technical Field

Examples herein refer to techniques (methods, systems, etc.) for positioning object elements in an imaging space.

For example, some techniques involve the computation of acceptable depth intervals for (e.g., semi-automatic) depth map estimation, e.g., based on 3D geometric primitives (primative).

Examples may relate to 2D images obtained in a multi-camera system.

Background

Having multiple images from a scene allows for the computation of a depth or disparity value for each pixel of the image. Unfortunately, automatic depth estimation algorithms are prone to errors and fail to provide error-free depth maps.

To correct these errors, the literature proposes mesh-based depth map refinements, because the distances of the meshes are approximations of acceptable depth values. However, to date, there has been no explicit description of how to calculate an acceptable depth interval. Furthermore, no consideration is given to the mesh only partially representing the scene, and specific methods are needed to avoid false depth map constraints.

The present example proposes, among other things, a method for achieving this. For example, the present application further describes a method of computing a set of acceptable depth intervals (or more generally, computing a range or interval of candidate spatial locations) from a given 3D geometry of a mesh. Such a set of depth intervals (or ranges or intervals of candidate spatial positions) may disambiguate the corresponding detection problem, thereby improving the overall resulting depth map quality. Furthermore, the application further shows how inaccuracies in the 3D geometry are compensated by so-called containment volumes to avoid false mesh constraints. In addition, the containment volume may also be used to process the occlusion to prevent erroneous depth values.

In general, examples relate to a method for positioning an object element (e.g., a surface portion of an object imaged in a 2D image) in a space (e.g., a 3D space) containing at least one determined object, the object element being associated with a particular 2D representation element (e.g., pixel) in the determined 2D image of the space. The depth of the object element can thus be obtained.

In an example, the imaging space may contain more than one object, e.g., a first object and a second object, and may request a determination of whether a pixel (or more generally, a 2D representation element) is to be associated with the first imaging object or the second imaging object. The depth of the object element in space can also be obtained: the depth of each object element will be similar to the depth of the adjacent elements of the same object.

The examples above and below may be based on a multi-camera system (e.g. a stereo system or a light field camera array, e.g. for virtual cinematography, virtual reality, etc.), where each different camera acquires a respective 2D image of the same space (with the same object) from a different angle (more generally based on predefined positional and/or geometric relationships). By relying on known predefined positional and/or geometric relationships, each pixel (or more generally each 2D representation element) can be located. For example, an epi-polar geometry may be used.

The shape of one or more objects placed within the imaging space may also be reconstructed, for example by locating a number of pixels (or more generally 2D representation elements) to construct a complete depth map.

In general, the processing power wasted by the processing unit to perform these methods is not negligible. Therefore, it is generally required to reduce the required computational effort.

Drawings

The following figures are discussed herein:

fig. 1 relates to localization techniques and, more particularly, to depth definition.

Figure 2 illustrates an outer pole geometry that is valid for both the prior art and the present example.

Fig. 3 shows a method (interactive depth map refinement workflow) according to an example.

Fig. 4 to 6 illustrate challenges in the art, in particular:

FIG. 4 illustrates a technique for approximating an object using an approximation surface.

FIG. 5 illustrates competition constraints due to approximating an approximation surface;

FIG. 6 illustrates the challenge of occluding objects to determine an acceptable depth interval.

Fig. 7 to 38 show the technique according to the present example, in particular:

FIG. 7 illustrates acceptable positions and depth values resulting from approximating a surface;

FIG. 8 illustrates acceptable positions and depth values resulting from containing volumes;

FIG. 9 illustrates the use of containment volumes to limit the acceptable depth values allowed for approximating a surface;

FIG. 10 illustrates the concept of handling occluding objects;

FIG. 11 illustrates the concept of resolving contention constraints;

FIGS. 12 and 12a show a planar approximation surface and its comparison with a closed volume approximation surface;

FIG. 13 illustrates an example of a limited range of acceptable candidate locations (rays) that intersect multiple containing volumes and one approximation surface;

FIG. 14 shows an example for deriving an acceptable depth range from tolerance values;

FIG. 15 illustrates aggregation of matching costs between two 2D images for computing a similarity metric;

FIG. 16 shows an embodiment of an inclined plane;

FIG. 17 shows a plot of 3D points (X, Y, Z) versus corresponding pixels in a camera coordinate system;

FIG. 18 illustrates an example of generating an containment volume by scaling an approximate surface related to a center of the zoom;

FIG. 19 illustrates the creation of a containment volume from a planar approximation surface;

FIG. 20 shows an example of an approximation surface represented by a triangular structure;

FIG. 21 shows an example of an initial approximation surface in which all triangle elements have been moved in 3D space along their normal vectors;

FIG. 22 shows an example of a reconnected grid element;

FIG. 23 illustrates a technique of replicating control points;

FIG. 24 illustrates a reconnection of duplicate control points such that all control points originating from the same initial control point are directly connected through a grid;

FIG. 25 illustrates an original triangle element intersecting another triangle element;

FIG. 26 shows a triangle structure decomposed into three new sub-triangle structures;

FIG. 27 shows an example of an excluded volume;

FIG. 28 shows an example of an enclosed volume;

FIG. 29 shows an example of an excluded volume;

FIG. 30 shows an example of a planar exclusion volume;

FIG. 31 illustrates an example of image-based rendering or display;

fig. 32 shows an example of determining pixels that cause view rendering (or display) artifacts;

FIG. 33 shows an example of an epi-polar editing mode;

FIG. 34 illustrates the principle of the free epi-polar editing mode;

FIG. 35 shows a method according to an example;

FIG. 36 shows a system according to an example;

FIG. 37 illustrates an implementation according to an example (e.g., for utilizing multi-camera consistency);

FIG. 38 shows a system according to an example;

FIG. 39 shows a program according to an example;

FIG. 40 shows a procedure that can be avoided;

fig. 41 and 42 show examples;

figures 43 to 47 illustrate a method according to an example;

Fig. 48 shows an example.

Detailed Description

3. Background of the invention

A 2D photograph captured by a digital camera may faithfully reproduce a scene (e.g., one or more objects in space). Unfortunately, such rendering is only effective for a single viewpoint. This is too limited for advanced applications such as virtual movie production or virtual reality. Instead, the latter requires the generation of novel views from the captured material.

This novel view can be created in different ways. The most straightforward approach is to create a grid or point cloud from the captured data [12] [13 ]. Alternatively, depth image based rendering or display may be applied.

In both cases, a depth map for each viewpoint needs to be computed for the captured scene. The viewpoint corresponds to one camera position of the captured scene. The depth map assigns a depth value to each pixel of the captured image under consideration.

As shown in fig. 1, the depth value 12 is the projection of the object element distance 13 on the optical axis 15 of the camera 1. The object element 14 is to be understood as an element (which may be a pixel, even if it is ideally 0-dimensional) represented by a 2D representation element (which is ideally 0-dimensional). The object elements 14 may be surface elements of a solid, opaque object (in the case of a transparent object, of course, the object elements may be refracted by the transparent object according to the laws of optics).

In order to obtain a depth value 12 for each pixel (or more generally for each 2D representation element), there are different approaches:

using active depth sensing devices, e.g. structured lighting or LIDAR

Computing depth from multiple images of the same scene

A combination of both

Despite the advantages of all these methods, the advantage of computing depth from multiple images is their low cost, short capture time and high resolution depth map. Unfortunately, it is not error free. If such a wrong depth map is used directly in one of the aforementioned applications, the synthesis of the novel views will lead to artifacts.

Therefore, there is a need to devise a method that can correct for artifacts in depth maps in an intuitive and fast manner.

4. Problems encountered in the technical field

In several examples below, we may consider the context of a scene being shot (or captured or acquired or imaged) from multiple camera positions (specifically, under known positional/geometric relationships between different cameras). The camera itself may be a photo camera, a video camera, or other type of camera (LIDAR, infrared, etc.).

Further, a single camera may be moved to a plurality of places, or a plurality of cameras may be used simultaneously. In the first case, a static scene may be captured (e.g., with immovable objects), while in the latter case, acquisition of moving objects is also supported.

The cameras may be arranged in any manner, but in an example in a known positional relationship to each other. In a simple case, two cameras may be arranged adjacent to each other, with parallel optical axes. In more advanced scenarios, the camera positions are located on a regular 2D grid and all optical axes are parallel. In the most general case, the known camera positions are arbitrarily located in space, and the known optical axes may be oriented in any direction.

Having multiple images from a scene, we can derive a depth value (or more generally locate each pixel) for each pixel (or more generally 2D representation element) in each 2D image.

To this end, we can mainly distinguish between two methods, namely manual assignment of depth values, automatic calculation and semi-automatic methods. Manual assignment is an extreme where the photos are manually converted into 3D meshes [10], [15], [16 ]. The depth map may then be computed by rendering or displaying the so-called depth or z-channel [14 ]. While this approach allows full user control, it is cumbersome to obtain an accurate pixel-by-pixel depth map.

The other extreme is the fully automated algorithm [12] [13 ]. Although they have the potential to calculate an accurate depth value for each pixel, they are inherently error-prone. In other words, the calculated depth value is completely wrong for some pixels.

Thus, none of these methods is entirely satisfactory. There is a need for a method that combines the accuracy of automatic depth map calculations with the flexibility and control of manual methods. Furthermore, the required method must rely as much as possible on existing 2D and 3D image processing software tools. This allows to profit from the very powerful editing tools already available in the art without having to recreate all the content from scratch.

For this reason, reference [8] points out the possibility of refining depth values using 3D meshes. While their method uses a mesh created for a previous frame or within a previous frame, it can be inferred that a 3D mesh created for a current frame can be used in addition to a mesh from a previous frame. Then, reference [8] claims to correct possible depth map errors by limiting the stereo matching range based on the depth information available for the 3D mesh.

While this approach thus enables us to use existing 3D editing software for interactive correction and improvement of depth maps, its direct application is not possible. First, since our mesh will be created manually, we need to prevent the user from having to re-model the entire scene to fix depth map errors located in the exact sub-portion of the image. To do this, we will need a specific mesh type, as described in section 9. Second, reference [8] does not explain how to accurately limit the search range of the underlying depth estimation algorithm. We will therefore propose an accurate method of how to convert a 3D mesh into an acceptable depth range.

5. Method of using similarity measure

Depth calculations from multiple images of a scene essentially require the establishment of correspondences within the images. This correspondence then allows the depth of the object to be calculated by triangulation.

Fig. 2 depicts a corresponding example in the form of two cameras (left 2D image 22 and right 2D image 23), with their optical centers (or entrance pupils or nodes) located at O, respectively_LAnd O_R. Both take a picture of an object element X (e.g., object element 14 of fig. 1). Pixel X of the object element X in the left camera image 22_LIs depicted in (1).

Unfortunately, it is not possible to determine (or otherwise locate) the depth of the object element X from the left image 22 alone. Virtually all object elements X, X₁、X₂、X₃Will result in the same pixel (or other 2D representation element) X_L. All these object elements are located on one line in the right camera, the so-called epi-polar line 21. Thus, identifying the same object element in the left view 22 and the right view 23 allows the depth of the corresponding pixel to be calculated. More generally, the 2D representation element X in the left 2D image 22 is identified by identifying the same in the right view 23_LAssociated object element X, object element X can be located.

To identify such correspondences, a number of different techniques exist. In general, these different techniques have in common that they compute some matching cost or similarity measure for each possible correspondence. In other words, each pixel or 2D representation element on the epi-polar line 21 (and possibly its neighboring pixels or 2D representation elements) in the right view 23 and the reference pixel X in the left view 22 _L(and possible neighbors thereof) are compared. For example, the sum of absolute differences [11 ] can be calculated]Or hamming distance of census transform [11]To make a comparison. The remaining differences (in the example) are then considered as matching costs or similarity measures, with greater costs indicating poorer matches. Thus, the depth estimation returns to selecting a depth candidate for each pixel in order to minimize the matching cost. This minimization may be performed independently for each pixel, or may be performed by performing a global optimization on the entire image.

Unfortunately, as can be appreciated from this description, the determination of correspondence is a difficult problem to solve. There may be several similar objects located on the epipolar line in the right view of fig. 2. Thus, a wrong correspondence may be selected, resulting in wrong depth values and thus artifacts in the virtual view synthesis. To name just one example, if the reference pixel X of the left image view 22 is erroneously inferred based on the similarity metric_LCorresponding to the object element X₂Object element X will thus be positioned incorrectly in space.

To reduce such depth errors, it must be possible for the user to manipulate the depth values. One method is the so-called 2D to 3D conversion. In this case, the user may assign depth values to different pixels [1] [2] [3] [4] in a tool-assisted manner. However, such depth maps are often inconsistent between multiple captured views and therefore cannot be applied to virtual view synthesis, as the latter requires a set of captured input images and a consistent depth map to obtain high quality unobstructed results.

Another class of methods includes post-filtering operations [5 ]. In this case, the user marks the erroneous area in the depth map and incorporates some additional information, e.g. whether the pixel belongs to a foreground or background area. Based on this information, the depth error is then eliminated by some filtering. While this approach is straightforward, it illustrates several drawbacks. First, it operates directly in 2D space, so that each depth map of each image needs to be corrected individually, which is a heavy task. Secondly, the correction is only an indirect form of filtering, so that the depth map correction cannot be guaranteed to be successful.

Thus, the third category of methods avoids filtering the wrong depth map, but aims at directly improving the initial depth map. One way to do this is to simply limit the acceptable depth values at the pixel level. Thus, instead of searching the entire epi-polar line 21 in fig. 2 for a correspondence, only a smaller portion is considered. This limits the possibility of confusing correspondences, resulting in an improved depth map.

[8] Following such a concept. It assumes a temporal video sequence in which a depth map should be computed for a time instance t. Further, assume that a 3D model is available for time instance t-1. The 3D model is then used as an approximation of the current pixel depth, thereby reducing the search space. Since the 3D model and the frame of the depth map that should be improved belong to different time instances, they need to align the 3D model with the current frame by performing pose estimation. While this complicates the application, it can be appreciated that the method closely approaches the challenges defined in section 4 by replacing the pose estimation with a function that simply returns to the 3D model rather than changing its pose. Unfortunately, such a concept loses important properties necessary to use manually created meshes. These methods of addition are subject to the present example.

Reference [6] explicitly introduces a method for manual user interaction. Their algorithm applies a graph cut algorithm to minimize the global matching cost. These matching costs consist of two parts: a data cost section defining a degree of color matching of two corresponding pixels, and a smoothness cost penalizing depth jumps between neighboring pixels having similar colors. The user may influence cost minimization by setting the depth values of certain pixels or by requesting that the depth values of certain pixels are the same as the depth values in the previous frame. These depth guides will also propagate to neighboring pixels due to the smoothing cost. In addition, the user may provide an edge map (edge map) so that the smoothing cost is not applied to the edge. Our approach is complementary compared to this work. We do not describe how to accurately correct depth map errors for a particular depth map algorithm. Instead, we show how an acceptable depth range can be derived from user input provided in 3D space. These acceptable depth ranges, which may be different for each pixel, may then be used in each depth map estimation algorithm to limit the possible depth candidates and thus reduce the probability of depth map errors.

An alternative method of improving the depth map is described in reference [7 ]. It allows the user to define smooth regions that should not contain depth jumps. This information is then used to improve the depth estimation. This method is only indirect in comparison to our method and therefore does not guarantee an error-free depth map. Furthermore, smoothness constraints are defined in the 2D image domain, which makes it difficult to propagate this information into all captured camera views.

6. Contribution to

At least some contributions are presented in the examples:

we provide an accurate method that can derive an acceptable range of depth values for relevant pixels of an image. In these ways, a high accuracy and quality depth map can be generated using a mesh that only approximately describes the 3D object. In other words, we do not insist that the 3D mesh accurately describes the position of the 3D object, but rather consider that the user can only provide a rough estimate. The estimate is then converted into an interval of acceptable depth values. This reduces the search space for the corresponding determination, independent of the applied depth estimation algorithm, thereby reducing the probability of erroneous depth values (section 10).

The grid may be misinterpreted if the user is only requested to provide approximate 3D grid locations. We provide a way to avoid this (section 9.3).

The method supports partial constraints of the scene. In other words, the user does not need to give depth constraints for all objects of the scene. Instead, accurate depth guidance is only required for the most difficult objects. This explicitly includes scenes where one object is occluded by another object. This requires a specific grid type not known in the literature and therefore may be an important part of the present example (section 9).

We show how the normal vectors known from the grid further simplify the depth map estimation by considering the inclined surface. Our contribution can limit the acceptable normal vectors, thereby reducing the possible match candidates, which again reduces the probability of false matches, resulting in a higher quality depth map (section 11).

We introduce another type of constraint, called excluded volume, which explicitly forbids certain depth values. In these ways, we can further reduce the possible depth candidates, thereby reducing the probability of erroneous depth values (section 13).

We make some improvements on how to create 3D geometric constraints based on single or multiple 2D images (section 15).

According to one aspect, a method is provided for locating a specific 2D representation element (X) in a determined 2D image associated with a space containing at least one determined object in the space _L) A method of associating object elements, the method comprising:

deriving a range or interval of candidate spatial positions of the imaged object element based on the predefined positional relationship;

limiting the range or interval of candidate spatial locations to at least one limited range or interval of acceptable candidate spatial locations, wherein limiting comprises at least one of:

limiting a range or interval of candidate spatial locations using at least one contained volume surrounding at least one determined object; and

limiting a range or interval of the candidate spatial locations using at least one excluded volume surrounding the unacceptable candidate spatial locations; and

based on the similarity measure, the most suitable candidate spatial location is retrieved among the acceptable candidate spatial locations of the limited range or interval.

In the method according to the example, the space may contain at least one first determined object and one second determined object,

wherein limiting comprises limiting the range or interval of candidate spatial locations to:

at least one first limited range or interval of acceptable candidate spatial positions associated with the first determined object; and

at least one second limited range or interval of acceptable candidate spatial positions associated with a second determined object,

Wherein limiting comprises defining at least one inclusion volume as a first inclusion volume around the first determined object and/or as a second inclusion volume around the second determined object to limit at least one first and/or second range or interval of candidate spatial positions to at least one first and/or second limited range or interval of acceptable candidate spatial positions; and

wherein retrieving comprises determining whether the particular 2D representation element is associated with the first determined object or the second determined object.

According to one example, determining whether the particular 2D representation element is associated with the first determined object or the second determined object is performed based on the similarity measure.

According to one example, it is determined whether a particular 2D representation element is associated with a first determined object or a second determined object based on the following observations:

one of at least one first and second limited ranges or intervals of acceptable candidate spatial locations is empty; and

the other of the at least one first and second limited ranges or intervals of acceptable candidate spatial positions is not empty, thereby determining that the particular 2D representation element is within the other of the at least one first and second limited ranges or intervals of acceptable candidate spatial positions.

According to one example, the restricting comprises using information from the second camera or the 2D image to determine whether the particular 2D representation element is associated with the first determined object or the second determined object.

According to one example, the information from the second camera or the 2D image comprises previously obtained positions of the object element comprised in:

at least one first limited range or interval of acceptable candidate spatial locations in order to infer that the object element is associated with the first object; or

At least one second limited range or interval of acceptable candidate spatial positions in order to infer that the object element is associated with a second object.

According to one aspect, there is provided a method comprising:

as a first operation, obtaining a position parameter and at least one containing volume associated with a second camera position;

as a second operation, a method according to any of the methods above and below is performed for a particular 2D representation element of a first 2D image obtained at a first camera position, the method comprising:

whether the following two conditions are satisfied is analyzed based on the position parameter acquired at the first operation:

at least one candidate spatial position will occlude at least one containing volume in a second 2D image obtained or obtainable at a second camera position, an

The at least one candidate spatial position is not occluded by the at least one containing volume in the second 2D image,

thus, if two conditions are met:

refraining from performing the retrieval even if the at least one candidate spatial location is within a limited range of acceptable candidate spatial locations of the first 2D image; and/or

At least one candidate spatial position is excluded from a limited range or interval of acceptable candidate spatial positions for the first 2D image, even if the at least one candidate spatial position is within the limited range of acceptable candidate spatial positions.

According to one example, the method may comprise:

as a first operation, obtaining a position parameter and at least one containing volume associated with a second camera position;

as a second operation, a method according to any of the above and below is performed on a particular 2D representation element of a first 2D image acquired at a first camera position, the method comprising:

based on the position parameters obtained at the first operation, it is analyzed whether a limited range of at least one acceptable candidate spatial position would be occluded by at least one inclusion volume in the second 2D image obtained or obtainable at the second camera position in order to keep the acceptable candidate spatial position within the limited range.

According to one example, the method may comprise:

as a first operation, a plurality of 2D representation elements are positioned for the second 2D image,

performing, as a second subsequent operation, the derivation, limitation and retrieval of the method according to any of the methods above and below for determining a most suitable candidate spatial position for a determined 2D representation element of the first determined 2D image, wherein the second 2D image and the first determined 2D image are acquired at spatial positions having a predetermined positional relationship,

wherein the second operation further comprises finding a 2D representation element in the second 2D image previously processed in the first operation corresponding to a candidate spatial position of the first determined 2D representation element of the first determined 2D image,

in order to further limit the range or interval of acceptable candidate spatial positions and/or to obtain a similarity measure with respect to the first determined 2D representation element in the second operation.

According to an example, wherein the second operation is such that, upon observing that the localized position of the 2D representation element in the previously obtained second 2D image will be occluded relative to the second 2D image by the candidate spatial position of the first determined 2D representation element considered in the second operation:

the limited range or interval of acceptable candidate spatial positions is further limited in order to exclude the candidate spatial position of the determined 2D representation element of the first determined 2D image from the limited range or interval of acceptable candidate spatial positions.

According to an example, upon observing that the localized position of the 2D representation element in the second 2D image corresponds to the first determined 2D representation element:

for a determined 2D representation element of the first determined 2D image, the range or interval of acceptable candidate spatial positions is limited, so that positions further than the localization position are excluded from the limited range or interval of acceptable candidate spatial positions.

According to an example, a method may further comprise, upon observing that the localized position of the 2D representation element in the second 2D image does not correspond to the localized position of the first determined 2D representation element:

the most suitable candidate spatial position of the determined 2D representation element of the first determined 2D image obtained in the second operation is invalidated.

According to an example, the localized position of the 2D representation element in the second 2D image corresponds to the first determined 2D representation element when the distance of the localized position to one of the candidate spatial positions of the first determined 2D representation element is within a maximum predetermined tolerance distance.

According to an example, the method may further comprise, when the 2D representation element is found in the second 2D image, analyzing a confidence value or reliability value of the positioning of the first 2D representation element in the second 2D image and using it only if the confidence value or reliability value is above a predetermined confidence threshold or the unreliability value is below a predetermined threshold.

According to one example, the confidence value may be based at least in part on a distance between the location position and the camera position, and the confidence value increases as the distance is closer.

According to one example, the confidence value is based at least in part on the number of objects or restricted ranges containing volumes or acceptable spatial positions, such that the confidence value is increased if a smaller number of objects or restricted ranges containing volumes or acceptable spatial positions are found in the range or interval of acceptable spatial candidate positions.

According to one example, limiting includes defining at least one approximation surface to limit at least one range or interval of candidate spatial locations to at least one limited range or interval of acceptable candidate spatial locations.

According to one example, the defining comprises defining at least one approximation surface and a tolerance interval for limiting at least one range or interval of candidate spatial positions to a limited range or interval of candidate spatial positions defined by the tolerance interval, wherein the tolerance interval has:

a distal end defined by at least one approximation surface; and

a near end defined based on a tolerance interval; and

based on the similarity measure, the most suitable candidate spatial location is retrieved among the acceptable candidate spatial locations of the limited range or interval.

According to an example, a method may comprise a method for locating, in a space containing at least one determined object, an object element associated with a particular 2D representation element in a 2D image of the space, the method comprising:

deriving a range or interval of candidate spatial positions of the imaged object element based on the predefined positional relationship;

limiting the range or interval of candidate spatial locations to at least one limited range or interval of acceptable candidate spatial locations, wherein limiting comprises:

defining at least one approximation surface and a tolerance interval for limiting at least one range or interval of candidate spatial positions to a limited range or interval of candidate spatial positions defined by the tolerance interval, wherein the tolerance interval has:

a distal end defined by at least one approximation surface; and

a near end defined based on a tolerance interval; and

based on the similarity measure, the most suitable candidate spatial location is retrieved among the acceptable candidate spatial locations of the limited range or interval.

According to one example, the method may be iterated by using an increased tolerance interval in order to increase the probability of containing an object element.

According to one example, the method may be iterated by using a reduced tolerance interval in order to reduce the probability of containing different object elements.

According to one example, wherein limiting comprises defining tolerance values for defining the tolerance interval.

According to an example, wherein the limiting comprises based onDefining a tolerance interval value Δ d obtained from at least one approximation surface, whereinIs a normal vector of the approximation surface at a point where the interval of the candidate spatial position intersects the approximation surface, and the vectorDefining the determined optical axis of the camera or 2D image.

According to one example, at least a portion of the tolerance interval value Δ d is defined according to at least one approximation surface based on the following formula:

Wherein t is₀Is a predetermined tolerance value, whereinIs a normal vector of the approximation surface at the point where the interval of candidate spatial positions intersects the approximation surface, wherein the vectorDefining the optical axis of the camera under consideration, and_maxclippingAndthe angle therebetween.

According to one example, the retrieving includes:

considering a normal vector of an approximation surface at an intersection between the approximation surface and a range or interval of candidate spatial positions

Based on the similarity measure, retrieving the most suitable candidate spatial position among the acceptable candidate spatial positions within the limited range or interval, wherein retrieving comprises based on relating to a normal vectorIn an acceptable candidate spatial position of a limited range or interval and based on a normal vector To retrieve the most suitable candidate spatial location.

According to one aspect, there is provided a method for locating, in a space containing at least one determined object, an object element associated with a particular 2D representation element in a determined 2D image of the space, the method comprising:

deriving a range or interval of candidate spatial positions of the imaged object element based on the predefined positional relationship;

limiting the range or interval of candidate spatial locations to at least one limited range or interval of acceptable candidate spatial locations, wherein limiting comprises defining at least one approximation surface so as to limit at least one range or interval of candidate spatial locations to a limited range or interval of candidate spatial locations;

considering a normal vector of an approximation surface at an intersection between the approximation surface and a range or interval of candidate spatial positions

Based on the similarity measure, retrieving the most suitable candidate spatial position among the acceptable candidate spatial positions within the limited range or interval, wherein retrieving comprises based on relating to a normal vectorIn an acceptable candidate spatial position of a limited range or interval and based on a normal vectorTo retrieve the most suitable candidate spatial location.

According to an example:

the retrieving includes processing for a particular 2D representation element ((X0, y0), X_L) Is determined by a similarity measure (c) of at least one candidate spatial position (d)_sum)，

Wherein the processing involves other 2D representation elements (x, y) within a particular neighborhood (N (x0, y0)) of a particular 2D representation element (x0, y0),

wherein the processing comprises for each of the other 2D representation elements (x, y), assuming a planar surface of the object in the object element, according to the vectorObtaining a vector from a plurality of vectors within a defined predetermined rangeTo derive a sum vectorAn associated candidate spatial position (D), wherein the candidate spatial position (D) is used to determine each 2D representation element (x, y) pair similarity measure (c) in the neighborhood (N (x0, y0))_sum) The contribution of (c).

According to one example, the retrieval is based on the following relationship:

wherein (x)₀,y₀) Is a specific 2D representation element, (x, y) is (x)₀,y₀) K is an intra-camera matrix, d is a depth candidate representing a candidate spatial position,is based on a particular 2D representation element (x) under the assumption of a planar surface of the object in the object element₀,y₀) To compute a function of the depth candidate for the particular 2D representation element (x, y).

According to one example, a similarity measure c based on the evaluation type is retrieved _sum(d)

WhereinRepresenting a sum or a general aggregation function.

According to one example, the constraint is included relative to a normal vectorIs obtained within a range or interval of acceptable vectors within a maximum tilt angle ofAt the intersection (143) of the at least one determined object with the range or interval of candidate spatial positions, a vector perpendicular to the at least one determined object

According to one example, the limiting comprises using a vector obtained perpendicular to the at least one determined object according to

According to one example, limiting includes using information from the 2D image to determine whether a particular 2D representation element is associated with the first determined object or the second determined object,

further comprising finding:

a first vector perpendicular to the at least one determined object at an intersection with a range or interval of candidate spatial positions of the first determined 2D image, an

A second vector perpendicular to the at least one determined object at an intersection with a range or interval of candidate spatial positions of the second 2D image, an

Comparing the first vector and the second vector to:

validating the position fix if the angle between the first vector and the second vector is within a predetermined threshold, and/or

If the angle between the first vector and the second vector is within a predetermined threshold, the localization is invalidated, or the localization of the first image or the second image is selected according to the confidence value.

According to one example, the normal vector of the approximation surface is only when the approximation surface intersects with the range or interval of candidate spatial positionsWith a predetermined direction within a particular range of directions.

According to an aspect of the present example, there is provided a method of calculating a specific direction range based on a direction of a range or section of candidate spatial positions relevant to determining a 2D image.

According to an aspect of the present example, there is provided a method of approximating a normal vector of a surface only at an intersection between the surface and a range or interval of candidate spatial positionsA constraint is applied when the dot product with the vector describing the path from the camera of the candidate spatial position has a predefined sign.

According to one example, the limiting comprises defining at least one approximation surface-defined end of at least one limited range or interval of acceptable candidate spatial locations, wherein the at least one approximation surface-defined end is located at an intersection between the approximation surface and the range or interval of candidate spatial locations.

According to one example, the approximation surface is defined by a user.

According to one example, limiting comprises scanning from a proximal position to a distal position along a range of candidate positions and ending when the at least one limited range or interval of acceptable candidate spatial positions is observed to have a distal end associated with the approximation surface. .

According to one example, at least one containment volume is automatically defined from at least one approximation surface.

According to one example, at least one containment volume is defined from the at least one approximation surface by scaling the at least one approximation surface.

According to one example, the at least one containment volume is defined from the at least one approximation surface by scaling the at least one approximation surface from a centre of scaling of the at least one approximation surface.

According to one example, the constraint relates to at least one containing volume or approximation surface formed by a structure of vertices or control points, edges and surface elements, wherein each edge connects two vertices and each surface element is surrounded by at least three edges, and there is a connection path of the edges from each vertex to any other vertex of the structure.

According to one aspect, a method is provided wherein each edge is connected to an even number of surface elements.

According to one aspect, a method is provided wherein each edge is connected to two surface elements.

According to one aspect, a method is provided in which a structure occupies an enclosed volume without boundaries.

According to one example, the at least one containment volume is formed by a geometric structure, the method further comprising defining the at least one containment volume by:

moving the element by decomposing the element along its normal; and

the elements are reconnected by generating additional elements (210bc, 210 cb).

According to one aspect, there is provided a method, the method further comprising:

inserting at least one new control point in the decomposition area;

at least one new control point is reconnected with the decomposed elements to form further elements.

According to one example, the elements are triangle elements.

According to one example, the limiting comprises:

a range or interval is searched from a near-side position to a far-side position within a range or interval of candidate spatial positions, and the search is ended when an approximate surface is retrieved.

According to one example, the at least one approximation surface is contained within the at least one object.

According to one example, the at least one approximation surface is a rough approximation of the at least one object.

According to one example, the method further comprises defining a limited range or interval of acceptable candidate spatial locations as the range or interval of candidate spatial locations obtained during the derivation when it is observed that the range or interval of candidate spatial locations obtained during the derivation does not intersect any approximation surfaces.

According to one example, the at least one contained volume is a rough approximation of the at least one object.

According to one example, a random subset in a restricted range or interval applied to acceptable candidate locations and/or acceptable normal vectors is retrieved.

According to one example, the limiting comprises defining at least one end of containment volume definition of at least one limited range or interval of acceptable candidate spatial locations.

According to one example, the containment volume is defined by a user.

According to one example, the retrieving comprises determining whether the particular 2D representation element is associated with the at least one determined object based on the similarity measure.

According to one example, at least one of the 2D representation elements is determining a pixel (X) in the 2D image_L)。

According to one example, the object element is a surface element of at least one determined object.

According to one example, the range or interval of candidate spatial positions of the imaged object element is spread out in a depth direction relative to the determined 2D representation element.

According to one example, a range or interval of candidate spatial positions of the imaged object element is spread along a ray that exits from a node of the camera relative to the determined 2D representation element.

According to one example, the retrieving comprises measuring the similarity measure along acceptable candidate spatial locations obtained from another 2D image of the space and having a limited range or interval of predefined positional relationships with the determined 2D image.

According to one aspect, an aspect is provided, wherein the retrieving comprises measuring the similarity measure along 2D representation elements in the further 2D image forming epipolar lines associated with the at least one restricted range.

According to one example, the limiting comprises finding an intersection between at least one of the inclusion volume, the exclusion volume, and/or the approximation surface and a range or interval of candidate locations.

According to one example, the limiting comprises finding an end of a limited range or interval of the candidate location using at least one of the inclusion volume, the exclusion volume, and/or the approximation surface.

According to one example, the limiting comprises:

a range or interval is searched from a near-side position to a far-side position within a range or interval of candidate spatial positions.

According to one example, the defining includes:

Selecting a first 2D image of the space and a second 2D image of the space, wherein the first 2D image and the second 2D image are acquired at camera positions having a predetermined positional relationship with each other;

at least a first 2D image is displayed,

guiding a user to select a control point in the first 2D image, wherein the selected control point is a control point forming an element of an approximate surface or an exclusion volume or a structure containing the volume;

guiding a user to selectively translate the selected point in the first 2D image while constraining the point to move along an epipolar line associated with the point in the second 2D image, wherein the point corresponds to a control point of an element of the same structure as the point,

in order to define the movement of elements of the structure in 3D space.

obtaining a spatial position of an imaging object element;

obtaining a reliability value or an unreliability value for the spatial position of the imaged object element;

if the reliability value does not meet a predefined minimum reliability or the unreliability does not meet a predefined maximum unreliability, performing the method of any of the methods above and below in order to refine the previously obtained spatial position.

According to one aspect, there is provided a method for refining, in a space containing at least one determined object, the positioning of previously obtained object elements associated with a particular 2D representation element in a determined 2D image of the space, the method comprising:

graphically displaying the determined 2D image of the space;

guiding a user to define at least one containing volume and/or at least one approximating surface;

a method according to any of the methods above and below to refine the previously obtained position fix.

According to one example, a method may further include, after defining the at least one containment volume or approximation surface, automatically defining an exclusion volume between the at least one containment volume or approximation surface and a position of the at least one camera.

According to one example, a method may further include, upon defining the first proximal contained volume or approximate surface and the second distal contained volume or approximate surface, automatically defining:

a first exclusion volume between the first containment volume or approximation surface and the position of the at least one camera; a second excluded volume between the second containing volume or approximation surface and the position of the at least one camera, wherein non-excluded regions between the first excluded volume and the second excluded volume are excluded.

According to one example, the method may be applied to a multi-camera system.

According to one example, the method may be applied to a stereoscopic imaging system.

According to one example, the retrieving comprises selecting, for each candidate spatial location of the range or interval, a portion of the limited range of candidate spatial locations that would be acceptable or not.

According to an aspect, a system for locating, in a space containing at least one determined object, an object element associated with a particular 2D representation element in a determined 2D image of the space may be provided, the system comprising:

a derivation block for deriving a range or interval of candidate spatial positions of the imaging object element based on a predefined positional relationship;

a restriction block for restricting a range or interval of candidate spatial locations to at least one restricted range or interval of acceptable candidate spatial locations, wherein the restriction block is configured to:

limiting a range or interval of candidate spatial locations using at least one contained volume surrounding at least one determined object; and/or

Limiting a range or interval of candidate spatial locations using at least one excluded volume comprising unacceptable candidate spatial locations; and

A retrieval block configured to retrieve the most suitable candidate spatial position among the acceptable candidate spatial positions of the limited range or interval according to the similarity measure.

According to one aspect, a system is provided, further comprising a first camera and a second camera for acquiring 2D images in a predetermined positional relationship.

According to one aspect, a system is provided, further comprising at least one movable camera for acquiring 2D images from different positions and in a predetermined positional relationship.

The system according to any of the above or below, further comprising: a constraint definer for rendering at least one 2D image to obtain an input for defining at least one constraint.

The system may also be configured to perform examples in accordance with any of the systems above or below.

According to one aspect, a non-transitory storage unit may be provided comprising instructions which, when executed by a processor, cause the processor to perform the above or above method.

Examples above and below may refer to a multi-camera system. Examples above and below may relate to stereoscopic systems, e.g. for 3D imaging.

The examples above and below may refer to a system that photographs a real space with a real camera. In some examples, all 2D images are the only real images obtained by a real camera shooting a real object.

7. Overall workflow

Fig. 3 depicts an overall method 30 of interactive depth map computation and refinement (or more generally for localization of at least one object element). At least one of the following steps may be implemented in examples in accordance with the present disclosure.

The first step 31 may comprise capturing the (spatial) scene from multiple perspectives using an array of multiple cameras, or by moving a single camera within the scene.

The second step 32 may be a so-called camera calibration. Based on an analysis of the feature points within the 2D image and/or the provided calibration chart, the calibration algorithm may determine the internal and external parameters for each camera view. The external reference camera parameters may define the position and orientation of all cameras relative to a common coordinate system. The latter is hereinafter referred to as the world coordinate system. The intrinsic parameters include (preferably all) relevant intrinsic camera parameters, such as the focal length and pixel coordinates of the optical axis. In the following, we refer to the intrinsic and extrinsic camera parameters as position parameters or position relationships.

The third step 33 may comprise a pre-processing step, e.g. to obtain high quality results, but this has no direct relation to the interactive depth map estimation. This may specifically include color matching between cameras to compensate for varying camera transmission curves, de-noising, and removing lens distortion. Furthermore, images of insufficient quality due to excessive blurring or false exposures can be eliminated in this step 33.

A fourth step 34 may calculate an initial depth map (or more generally a coarse localization) for each camera view (2D image). Here, there may be no limitation on the depth calculation program to be used. The purpose of this step 34 may be to identify difficult scene elements (automatically or manually), where the automated program requires user assistance to deliver the correct depth map. Section 14 discusses an exemplary method of identifying such depth map errors. In some examples, several techniques discussed below may operate by refining the coarse positioning results obtained at step 34.

In step 35 (e.g. fifth step), it is possible to generate position constraints for scene objects in the 3D space (e.g. manually). To reduce the necessary user effort, such location constraints may be limited to the relevant objects identified in step 38 whose depth values from step 34 (or 37) are incorrect. For example, these constraints may be created by drawing polygons or other geometric primitives into 3D world space using any possible 3D modeling and editing software. For example, the user may define:

-at least one containing volume (inclusive volume) surrounding the object (and therefore in which the object element is likely to be located); and/or

-at least one excluded object in which the position of the object element is not searched for; and/or

-at least one approximation surface contained within at least one determined object.

For example, the constraints may be created or drawn or selected manually, for example with the aid of an automated system. For example, the user may define an approximate surface, and the method proceeds back to include the volume (see section 12).

The base coordinate system used to create the position constraints may be the same as the coordinate system used for camera calibration and positioning. The user-drawn geometric primitives give an indication of where the objects are located without having to indicate for each object in the scene the exact location in 3D space. In the case of video sequences, the creation of geometric primitives can be significantly accelerated by automatically tracking them over time (39). Section 13 gives more detailed information about the concept of location constraints. Section 15 describes in more detail example techniques of how geometric primitives can be easily drawn.

In step 36 (e.g., sixth step), the position constraints (including volume, excluded volume, surface approximation, tolerance values … …, which may be selected by the user) may be converted into depth map constraints (or positioning constraints) for each camera view. In other words, the range or interval of candidate spatial locations (e.g., associated with pixels) may be limited to at least one limited range or interval of acceptable candidate spatial locations. Such a depth map constraint (or position constraint) may be denoted as a disparity map constraint. Thus, in connection with interactive depth or disparity map improvement, the depth map and disparity map are conceptually the same and will be treated as such, although each of them has very specific meanings that should not be confused in implementation.

The constraints obtained in step 36 may limit the possible depth values (or localization estimates), thus reducing the computational effort of the similarity measure analysis to be processed in step 37. Examples are discussed, for example, in section 10. Alternatively or additionally, the possible surface normal directions of the considered object may be limited (see section 11). Both types of information may be valuable for disambiguating the corresponding determination problem and may therefore be used for regenerating the depth map. The program may search for possible depth values in a smaller set of values than the initial depth map calculation, and exclude other possible solutions. A similar procedure (e.g., the procedure of step 34) as the initial depth map calculation may be used.

For example, step 37 may be obtained using techniques that rely on similarity measures, such as those discussed in section 5. At least some of these methods are known in the art. However, the procedure for calculating and/or comparing these similarity measures may make use of the constraints defined at step 35.

As identified by iteration 37' between steps 37 and 38, the method may be iterated, for example by regenerating the positioning obtained by step 36 at step 37, in order to identify a positioning error at 38, which may then be refined in a new iteration of steps 35 and 36. Each iteration can refine the results obtained in the previous iteration.

A tracking step 39 can also be foreseen, for example, for moving objects, considering the temporally subsequent frames: the constraints obtained for the previous frame (e.g., at time t-1) are considered to obtain the location of the 2D representation element (e.g., pixel) at the subsequent frame (e.g., at time t).

Fig. 35 shows a more general method 350 for locating, in a space containing at least one determined object, an object element associated with a particular 2D representation element (e.g., pixel) in a determined 2D image of the space. The method may include:

step 351, deriving a range or interval of candidate spatial positions (e.g. rays shot from a camera) for the imaged spatial element based on a predefined positional relationship;

step 352 (which may be associated with steps 35 and/or 36 of method 30) limits the range or interval of candidate spatial locations to at least one limited range or interval (e.g., one or more intervals in the ray) of acceptable candidate spatial locations, wherein the limiting includes at least one of:

limiting a range or interval of candidate spatial locations using at least one contained volume surrounding at least one determined object; and/or

Limiting a range or interval of candidate spatial locations using at least one excluded volume comprising unacceptable candidate spatial locations; and/or

Other constraints (e.g., approximation surfaces, tolerance values, etc., such as having the features discussed below);

the most suitable candidate spatial location is retrieved (step 353, which may be associated with step 37 of method 30) among the acceptable candidate spatial locations within the limited range or interval based on the similarity metric.

In some cases, the method may be iterated in order to refine the coarse results obtained in the previous iteration.

Even though the above and below techniques are discussed primarily in terms of method steps, it should be understood that the present example also relates to a system, such as 360 shown in fig. 36, configured to perform a method, such as the exemplary method.

The system 360 (which may process at least one of the steps of the method 30 or 350) may locate, in a 3D space containing at least one determined object, an object element associated with a particular 2D representation element 361a (e.g. pixel) of a determined 2D image of the space. The derivation block 361 (which may perform step 351) may be configured to derive a range or interval of candidate spatial positions of the imaging space element (e.g., rays shot from a camera) based on a predefined positional relationship. The limiting block 362 (which may perform step 352) may be configured to limit the range or interval of candidate spatial locations to at least one limited range or interval (e.g., one or more intervals in the ray) of acceptable candidate spatial locations, wherein the limiting block 362 may be configured to:

Limiting a range or interval of candidate spatial locations using at least one contained volume surrounding at least one determined object; and/or

Limiting a range or interval of candidate spatial locations using at least one excluded volume comprising unacceptable candidate spatial locations; and/or

Using other constraints (e.g., approximation surfaces, tolerance values, etc.) to limit the range or interval of candidate spatial locations;

the retrieving block 363 (which may perform steps 353 or 37) may be configured for retrieving the most suitable candidate spatial position (363') among the acceptable candidate spatial positions of the limited range or interval based on the similarity measure.

Fig. 38 illustrates a system 380, which can be an example of an implementation of system 360 or another system configured to perform techniques according to the following or the above examples and/or methods such as methods 30 or 350. A derivation block 361, a restriction block 362 and a retrieval block 363 are shown. The system 380 may process data obtained from at least first and second cameras (e.g., a multi-camera environment, such as a stereo vision system) that acquire images of the same space from different camera positions, or data obtained from one camera moving along multiple positions. A first camera position (e.g., a position of a first camera or a camera position at a first time instant) may be understood as a first position at which the camera provides a first 2D image 381 a; the second camera position (e.g., the position of the second camera or the position of the same camera at the second time instant) may be understood as the position at which the camera provides the second 2D image 381 b. The first 2D image 381a may be understood as the first image 22 of fig. 2, and the second 2D image 381b may be understood as the second image 23 of fig. 2. Other cameras or camera positions may optionally be used, for example, to provide additional 2D images 381 c. Each 2D image may be a bitmap, for example, or a class matrix representation comprising a plurality of pixels (or other 2D representation elements), each pixel being assigned to a value, for example, an intensity value, a color value (e.g., RGB), etc. The system 380 may associate each pixel of the first 2D image 381a with a particular spatial position in imaging space (a particular spatial position is the position of an object element imaged by a pixel). To achieve this goal, the system 380 may utilize the second 2D image 381b and data associated with the second camera. Specifically, the system 380 may be fed with at least one of:

Positional relationship of cameras (e.g. optical center O for each camera_LAnd O_RAs shown in fig. 2) and/or camera intrinsic parameters (e.g., focal length), which may affect the acquired geometry (these data are represented here by 381a 'for the first camera and 381b' for the second camera);

acquired images (e.g. bitmap, RGB, etc.) 381a and 381 b.

In particular, the parameters 381a' of the first camera (no 2D image is absolutely necessary) may be fed into the push block 361. Thus, the derivation block 361 can define a range of candidate spatial locations 361', which can follow, for example, rays that are shot from a camera and represent corresponding pixels or 2D representation elements. Basically, the range of candidate spatial locations 361' is based only on geometric properties and properties inside the first camera.

The limit block 362 may further limit the range of candidate spatial locations 361 'to a limited range of acceptable candidate spatial locations 362'. The limit block 362 may select only a subset of the range of candidate spatial locations 361 'based on constraints 364' (approximate surface, included volume, excluded volume, tolerance … …) obtained from the constraint definer 364. Thus, some candidate locations in range 361 'may be excluded from range 362'.

Constraint definer 364 may allow user input 384a to control the creation of constraint 364'. Constraint definer 364 may operate using a Graphical User Interface (GUI) to assist a user in defining constraints 364'. Constraint definer 364 may be fed into first 2D image 381a and second 2D image 381b, and in some cases, into previously obtained depth map (or other positioning data) 383a (e.g., as obtained in initial depth calculation step 34 of method 30 or from a previous iteration of method 30 or system 380). Thus, the user is visually guided in defining the constraint 384 b. Examples are provided below (see fig. 30-34 and related text).

In some cases, the constraint definer 364 may also be fed with feedback 384b from a previous iteration (see also the connection between steps 37 and 38 in the method 30 of fig. 3).

Accordingly, the limit block 362 can impose constraints 364 that have been introduced (e.g., graphically 384a) by a user to define a limited range of acceptable candidate spatial locations 362'.

Thus, the retrieval block 363, which may implement, for example, steps 37 or 353 or other techniques discussed above and/or below, may analyze locations within a limited range of acceptable candidate spatial locations 362'.

The retrieval block 363 may specifically include an estimator 385 that may output the most appropriate candidate spatial location 363' as an estimated location (e.g., depth).

The retrieval block 363 may include a similarity metric calculator 386, which may process similarity techniques (e.g., those discussed in section 5 or other statistical techniques). The similarity measure calculator 386 may be fed with:

-camera parameters (381a ', 381b') associated with both the first and second (381a, 381b) cameras; and

-a first and a second 2D image (381a, 381b), and,

in some cases, additional images (e.g., 381c) and additional camera parameters associated with the additional images.

For example, the retrieval block 363 may retrieve the second 2D image (e.g., 23 in fig. 2) from the first 2D image (e.g., 22) by analyzing a specific pixel (e.g., X)_L) Similarity measures of those pixels of the associated epipolar line (e.g., 21) to determine a similarity measure from pixels (e.g., X) of the first image (e.g., 22)_L) The spatial position (or at least the most suitable candidate spatial position) of the represented object element (e.g., X).

An estimated position invalidator or depth map invalidator 383 (which may be based on user input 383b and/or which may be fed into a depth value 383a or other coarse positioning value, e.g. as obtained in step 34 or from a previous iteration of the method 30 or system 380) may be used (383c) to validate or validate the provided position (e.g. depth) 383 a. Since a confidence or reliability or unreliability value for each estimated position 383a is typically available, the provided position 383a may be analyzed and compared to a threshold value C0(383b) that may have been entered by the user. The threshold C0 may be different for each approximation surface. The purpose of this step may be to refine only those locations that were incorrect or unreliable in the previous location step (34) or previous iterations of method 30 or system 380 with the new location estimate 363'. In other words, if the provided input values 383a are incorrect or do not have sufficient confidence, only the location estimate 363' affected by the constraint 364' is considered for the final output 387 '.

To this end, if the reliability/confidence of 383a does not meet the user-provided threshold C0, the position selector 387 may select the obtained depth map or estimated position 363' based on the validity/invalidity (383C) of the provided depth input 383 a. Additionally or alternatively, a new iteration may be performed: the coarse or incorrectly estimated position may be fed back to the constraint definer 364 as feedback 384b, and a new, more reliable iteration may be performed.

Otherwise, the depth (or other location) calculated from the retrieval block 363 or copied from the input 383a is accepted and provided (387') as the location associated with the particular 2D pixel of the first 2D image.

8. Analysis of example resolved technical problem

Reference [8] describes a method of how to refine a depth map using a 3D mesh. To do this, they calculate the 3D mesh depth of a given camera and use this information to update the depth map of the camera in question. Fig. 4 depicts the corresponding principle. It shows an example 3D object 41 in the form of a sphere and an approximate surface 42 that the user has drawn. The camera 44 acquires the sphere 41 to obtain a 2D image as a matrix of pixels (or other 2D representation elements). Each pixel is associated with an object element and can be understood to represent the intersection between one ray and the surface of the real object 41: for example, one pixel (representing a point 43 of the real object 41) is understood to be associated with an intersection between a ray 45 and the surface of the real object 41.

However, it is not easy to identify the true position of the point 43 in space. From the 2D image acquired by the camera 44 it is not particularly easy to understand where the point 43 may be located, e.g. in an incorrect position 43' or 43 ". This is why some strategy needs to be used to determine the true position of the imaged point (object element).

According to one example, the search for the correct imaging position may be limited to a particular acceptable range or interval of candidate spatial positions, and the similarity metric is processed based on triangulation between two or more cameras to derive the actual spatial position only within the acceptable range or interval of candidate spatial positions.

To limit the search for acceptable candidate spatial locations, techniques based on the use of approximation surfaces 42 may be used. Approximation surface 42 may be drawn in the form of a grid, although alternatives to geometric primitives are possible.

In order to calculate an acceptable interval of depth values for each pixel, it is necessary to determine whether a given pixel is affected by the approximation surface, and if so, which part of the approximation surface the given pixel is affected by. In other words, it is necessary to determine which 3D point 43 "' of the approximation surface 42 is depicted by a pixel. The depth of the 3D point 43 "' may then be considered as a possible depth candidate for that pixel.

By intersecting a ray 45 defined by the pixel and the entrance pupil of the camera 44 with the approximation surface 42, a 3D point (e.g., 43 "') of the approximation surface actually depicted by the pixel can be found. If such an intersection 43 "' is present, the pixel is considered to belong to the approximation surface 42 and thus to the 3D object 41.

Thus, the range or interval of acceptable candidate spatial positions for a pixel may be limited to a near-side interval starting at point 43 '"(point 43'" being the intersection between ray 45 and approximation surface 42).

Ray intersection is a technical concept that can be implemented in different variants. One is to intersect all geometric primitives of the scene with ray 45 and determine those geometric primitives that have valid intersection points. Alternatively, all approximation surfaces may be projected onto the camera view. Each pixel is then marked with an associated approximation surface.

Unfortunately, even if we have found a 3D point 43 "'approximating the surface 42 and the depth of this 3D point 43"' corresponding to the considered pixel imaged by the camera 44, we still need to derive a possible interval of acceptable depth values for the considered pixel. Basically, even though we have found the end 43 "' of the acceptable candidate spatial position range, we do not have sufficient limits on the range of acceptable candidate spatial positions.

While reference [8] does not provide any detailed information about this step, there are some traps that need to be avoided, which will be described in detail below.

It is first noted that the depth value corresponding to the point 43 "' cannot be taken directly as the depth value of the pixel. The reason is that users cannot or are unwilling to create very accurate grids due to time constraints. Therefore, the depth of point 43' "of approximation surface 42 is different from the depth of the real object at point 43. Rather, it provides only a rough approximation. As can be seen in fig. 4, the true depth of the true object 41 (corresponding to point 43) is different from the depth of the 3D mesh 42 (corresponding to point 43 "'). Therefore, we need to design an algorithm that converts the 3D mesh 42 into acceptable depth values (see section 10).

The fact that the user only indicates an approximate surface that is approximated makes it desirable that the approximate surface is contained within the real object. The consequences of a deviation from this rule are shown in fig. 40.

In this case, the user draws a rough approximation surface 402 for the true cylindrical object 401. In these ways, it is assumed that object element 401' "associated with ray 2(405b) is near approximation surface 402 because ray 2(405b) intersects approximation surface 402. However, such a conclusion need not be true. In these ways, false constraints may be imposed, resulting in artifacts in the depth map.

Therefore, it is suggested below to include an approximate surface within the object. The user may deviate from the rule and depending on the camera position the obtained result is still correct, but the risk of wrong depth values increases. In other words, although the following figures and discussion assume that the approximation surface is located within the object, if this does not hold, then minor modifications can be made applying the same method.

Even when the approximation surfaces are contained within the object, challenges still exist because they only roughly represent the object. One of which is contention constraints, as shown in fig. 5. FIG. 5 depicts an example of a competing approximation surface. The scene consists of two 3D objects, namely a sphere (labeled real object 41) and a plane as second object 41, not visible in the figure. The 3D objects 41 and 41a are represented by corresponding approximation surfaces 42 and 42 a.

In the 2D image obtained from camera 1(44), the pixel associated with ray 1(45) may be located in the first real object 41 by intersection with the approximation surface 42, which correctly limits the acceptable spatial position to a position in the interval in ray 1(45) to the right of point 43' ″.

However, there is a problem of correctly locating the pixel associated with ray 2(45a), which actually represents element 43b (the intersection between the circumference of real object 41 and ray 45 a). It is not possible to make a correct conclusion a priori. In fact, ray 2(45a) does not intersect approximation surface 1(42), but it does intersect approximation surface 2(42a) at point 43 a' ". Therefore, using only the approximate surface results in limiting the range or interval of the candidate spatial positions to erroneously position the pixel obtained from the ray 2(45a) to correspond to the plane 41a (second object), rather than to the actual position of the element 43 b.

The reason for these defects is that the approximation surface 1(42) only roughly describes the sphere 41. Thus, ray 2(45a) does not intersect approximation surface 1(41), resulting in erroneously positioning element 43b to correspond to point 43 a' ".

Without approximating surface 2(42a), pixel 2 belonging to ray 2(45a) would be unconstrained. Depending on the depth estimation algorithm and similarity measure used, it may be assigned to the correct depth because its neighboring pixels will have the correct sphere depth, but at the expense of a very high computational effort (since the similarity measures for all positions scanned in ray 2(45a) should be compared, thus greatly increasing the process). However, when the approximation surface 2(42a) is in place, the pixel 2 will be forced to belong to the plane 41 a. As a result, the derived acceptable depth range will be erroneous, and therefore the calculated depth value will also be incorrect.

Therefore, we need to propose a concept of how to avoid erroneous depth values caused by competing depth constraints.

Another problem has been identified with respect to occlusion. Occlusion needs to be considered in a certain way to get the correct depth value. This is not considered in reference [8 ]. Fig. 6 shows a real object (cylinder) 61, depicted by a hexagonal approximation surface 62 and imaged by camera 1(64) and camera 2(64a) which are known to be in positional relationship to each other. However, for one of the two camera positions, the cylindrical object 61 is occluded by another object 61a drawn as a diamond, which lacks any approximate surface.

This situation has the following consequences: while the pixel associated with ray 1(65) will be correctly identified as belonging to the approximate surface 62, the pixel associated with ray 2(65a) will not be correctly identified: the pixel (65a) associated with ray 2 will be associated with the object 61 only by means of intersecting the approximated surface 62 of the object 61 and without any approximated surface obstructing the object 61 a.

Thus, the acceptable depth range for ray 2(65a) would be close to cylinder 61 rather than diamond 62. Therefore, the automatically calculated depth values will be erroneous.

One way to address this situation is to create another approximation surface of the diamond shape as well. But this results in the need to manually create an accurate 3D reconstruction for all objects in the scene, which would be avoided as it is too time consuming.

In general, whether or not an approximation surface is used, techniques according to the prior art may be error prone (by erroneously assigning pixels to false objects) or may require processing similarity measures over an overly extended range of candidate spatial locations.

Accordingly, techniques are needed that reduce the computational effort and/or increase the reliability of positioning.

9. Interactive depth map improved containment volume

In the above and in the following, for the sake of brevity, reference is often made to "pixels", even though these techniques may be generalized as "2D representation elements".

In the above and in the following, for the sake of brevity, reference is often made to "rays", even though these techniques may be generalized as "ranges or intervals of candidate spatial locations". The range or interval of candidate spatial positions may, for example, extend or spread along a ray that exits from a node of the camera with respect to the determined 2D image. Theoretically, from a mathematical standpoint, a pixel is ideally associated with a ray (e.g., 24 in fig. 2): pixel (X in FIG. 2)_L) May be associated with a range or interval of candidate spatial locations, which may be understood as a slave node (O in fig. 2)_L) Away into an infinitely large conical or frusto-conical volume. However, in the following, the "range or interval of candidate spatial positions" is mainly discussed as "rays", bearing in mind that its meaning can be easily generalized.

The "range or interval of candidate spatial locations" may be limited to a more limited "range or interval of acceptable candidate spatial locations" based on certain constraints discussed below (thus excluding some spatial locations that are deemed unacceptable, e.g., by visual analysis by a human user or by automatic determination methods). For example, unacceptable spatial locations (e.g., those that are visibly incorrect at the human eye) may be excluded to reduce the computational effort in retrieval.

Reference is often made herein to "depth" or "depth map", both of which may be generalized as a concept of "positioning". The depth map may be stored as a disparity map, where disparity is proportional to the multiplicative inverse of depth.

Here, a reference to an "object element" may be understood as "a portion of an object imaged (associated with) by (e.g., a pixel of) a corresponding 2D representation element". In some cases, an "object element" may be understood as a small surface element of an opaque solid element. Generally, the objective of the discussed techniques is to locate object elements in space. For example, by putting together the locations of a plurality of object elements, the shape of a real object can be reconstructed.

The examples discussed above and below may relate to a method for positioning an object element. An object element of at least one object may be located. In some cases, if there are two objects (different from each other) in the same imaging space, there is a possibility of determining whether an object element is associated with the first object or the second object. In some examples, a previously obtained depth map (or otherwise coarse localization) may be refined, for example, by locating object elements with a higher reliability than in the previously obtained depth map.

In the examples below and above, positional relationships are generally referenced (e.g., optical center O for each camera)_LAnd O_RAs shown in fig. 2). It is understood that "positional relationship" also includes all camera parameters, including extrinsic camera parameters (e.g., position, orientation, etc. of the camera) as well as intrinsic camera parameters (e.g., focal length, which may affect the geometry of the acquisition, as is typical in epi-polar geometry). These parameters (which may be known a priori) may also be used for the calculation of the localization and depth map estimation.

When analyzing the similarity measure (e.g., at step 37 and/or step 353 or block 363) for the first determined image (e.g., 22), the second 2D image (e.g., 22) may be considered (e.g., by analyzing pixels in the epi-polar line 21 of fig. 2). The second 2D image (22) may be acquired by a second camera or by the same camera at a different location (assuming no movement of the imaged object). Fig. 7, 8, 9, 10, 13, 29, etc. show only a single camera for simplicity, but it should be understood that the calculation of the similarity measure (37, 353) will be done by using a second 2D image obtained with a different, not shown camera or by the same camera from a different position (as shown in fig. 2) having a known positional relationship, i.e. internal and external parameters of the known camera.

It should also be noted that two cameras are shown in the examples that follow (e.g., fig. 11, 12a, 27, 37, etc.). These two cameras may be those used to analyze the similarity metric (e.g., at steps 37 and/or 353) in an epi-polar geometry (as shown in fig. 2), but this is not entirely necessary: there may also be other (not shown) additional cameras that may be used to analyze the similarity metric.

In the examples above and below, reference is often made to "objects" in the sense that they may be plural. However, while they may be understood as being different and separate from each other in some cases, it is also possible that they are structurally connected to each other (e.g., by being made of the same material, or by being connected to each other, etc.). It will be appreciated that in some instances, "objects" may thus refer to "portions" of the same object, and the term "object" is thus to be given a broad meaning. Thus, the approximation surface and/or the inclusion volume and/or the exclusion volume may be associated with different portions of a single object (each portion being the object itself).

9.1 principle

Problems such as those explained in section 8 can be addressed by supplementing the approximation surface (e.g., in the form of a 3D mesh) by implementing a containment volume. In more detail, two geometric primitives can be described as follows.

Reference is now made to fig. 7. In the case where a ray 75 (or another range or interval of candidate locations) associated with a pixel (or another 2D representation element) intersects the approximation surface 72, this may indicate that the corresponding 3D point 73 is located between the position of the camera 74 and the intersection 73' ″ of the pixel ray 75 and the approximation surface 71.

Without additional information, 3D point (object element) 73 may be located in an interval 75 'along ray 75 between point 73 "' (intersection of ray 75 with approximation surface 72) and the position of camera 74. This may lead to insufficient restriction: the similarity measure must be processed for the entire segment length. According to the examples provided herein, we introduce methods to further reduce acceptable depth values, thereby improving reliability and/or reducing computational effort. Therefore, one goal we pursued is to process the similarity measure in a more restricted range than range 75'. In some examples, this goal may be achieved by using constraint-based techniques.

A preferred constraint-based technique may be a technique that adds containment volumes. The contained volume may be an enclosed volume. They can be understood as follows: the enclosed volume divides the space into two or more separate subspaces such that there is no path from one subspace to another without passing through the surface of the enclosed volume. The containment volume represents the possible presence of 3D points (object elements) in the enclosed volume. In other words, if a ray intersects the containing volume, it defines a possible 3D location point, as shown in FIG. 8.

Fig. 8 shows an enclosed volume 86 bounding a ray 85 between a proximal end 86' and a distal end 86 ".

According to one example, outside the containment volume 86, no localization on ray 85 is possible (and no attempt will be made to analyze the similarity measure in step 37 or step 353 or block 363). According to one example, positioning on the ray 85 is only possible within the at least one containment volume 86. Notably, the containment volume may be generated automatically (e.g., from an approximate surface) and/or manually (e.g., by a user) and/or semi-automatically (e.g., by a user with the assistance of a computing system).

An example is provided in fig. 9. Here, the real object 91 is imaged by the camera 94. By ray 1(95), the particular pixel (or more generally the 2D representation element) represents the object element 93 (which is at the intersection between the surface of the object 91 and ray 1 (95)). We intend to locate the object elements 93 with high reliability and low computational effort.

The approximation surface 1(92) may be defined (e.g., by a user) such that each point of the approximation surface 1(92) is contained within the object 91. The containment volume 96 may be defined (e.g., by a user) such that the containment volume 96 surrounds the real object 91.

First, a range or interval of candidate spatial positions is derived: the range may be spread along ray 1(95) and thus contain a large number of candidate locations.

Further (e.g., at step 36 or step 352 or block 362 in fig. 3), the range or interval of candidate spatial locations may be limited to a limited range or interval of acceptable candidate spatial locations, e.g., line segment 93 a. Line segment 93a may have a first proximal end 96 "'(e.g., the intersection between ray 95 and contained volume 96) and a second distal end 93"' (e.g., the intersection between ray 95 and approximation surface 92). A comparison between line segment 93a (using this technique) and line segment 95a (to be used without the inclusion of the definition of volume 96) may allow an understanding of the advantages of the present technique over the prior art.

Finally (e.g., at step 37 or 353 or block 363), the similarity measure may be calculated only within the line segment 93a (the limited range or interval of acceptable candidate spatial positions), and the object element 93 may be more easily and reliably positioned (thus excluding unacceptable positions, such as positions between the camera position and point 96 "') without over-using computational resources. For example, a similarity measure as explained with reference to fig. 2 may be used.

In one variation, approximation surface 1(92) may not be needed. In that case, the similarity measure will be calculated along the entire line segment defined between the ray 95 and the two intersection points of the containing volume 96, although the computational effort with respect to the line segment 95a is reduced (as in the prior art).

The inclusion volume can be used in at least two ways in a general sense:

they may surround the approximation surface. In this function, they limit the acceptable 3D points, and thus the depth values allowed by the individual approximation surfaces. This is depicted in fig. 9. Approximation surface 1(92) specifies that all 3D points (acceptable candidate locations) must be located between the camera 94 and the intersection 93 "' of the pixel ray 95 and approximation surface 1 (92). On the other hand, the inclusion volume 96 specifies that all points (acceptable candidate locations) must be located within the inclusion volume 96. Therefore, only the bold position of the line segment 93a in fig. 9 is acceptable.

The containment volume may surround the occluding object. This is shown, for example, in fig. 10 (see the subsequent section). The hexagonal approximation surface and the contained volume limit the possible positions of the cylindrical real object. The contained volume surrounding the diamond eventually informs the depth estimation program that there may also be 3D points in the surrounding contained volume. Thus, the allowed depth value is formed by two unconnected bins, each defined by a containing volume.

Instead of using the containment volume to define an acceptable depth range around the approximation surface, a limited range or interval of acceptable spatial positions may also be obtained at least partially implicitly (e.g., without the user actually consuming the containment volume). More details of this approach will be explained in section 10.2. Section 10 gives an accurate procedure how to convert the contained volume and approximated surface to an acceptable depth range.

It may be noted that the at least one restricted range or interval of acceptable candidate spatial locations may be a discontinuous range and may include a plurality of different and/or disjoint restricted ranges (e.g., one restricted range per object and/or one restricted range per encompassed volume).

9.2 processing occluding objects by including volumes

Reference may be made to fig. 10. Here, the first real object 91 is imaged by the camera 94. However, the second real object 101 exists in the foreground (occluding object). The element represented by the pixel associated with ray 2(105) (or more generally the 2D representation element) is intended to be located: it is not known a priori whether this element is an element of the real object 91 or an element of the real occluding object 101. Specifically, positioning the imaging element may mean determining to which of the real objects 91 and 101 the element belongs.

For the pixels associated with ray 1(95), reference may be made to the example of fig. 9, concluding that the imaged object element is to be retrieved within interval 93 a.

However, for the pixels associated with ray 2(105), it is not easy a priori to locate the imaged object element. To achieve such an object, techniques such as the following may be used.

The approximation surface 92 may be defined (e.g., by a user) such that each point of the approximation surface 92 is contained within the real object 91. The first encompassing volume 96 may be defined (e.g., by a user) such that the first encompassing volume 96 surrounds the real object 91. The second containment volume 106 may be defined (e.g., by a user) such that the containment volume 106 surrounds the second object 101.

First, a range or interval of candidate spatial positions is derived: the range may be spread out along ray 2(105) and thus contain a large number of candidate positions (between camera position and infinity).

Additionally (e.g., at blocks 35 and 36 in the method 30 of fig. 3 and/or 352 in the method 350 of fig. 35), the range or interval of candidate spatial locations may be further limited to at least one limited range or interval of acceptable candidate spatial locations. The at least one restricted range or interval of acceptable candidate spatial locations (and therefore spatial locations deemed unacceptable may be excluded) may include:

a line segment 93a ' (as shown in the example of fig. 9) between the ends 93 "' and 96" ';

a line segment 103a between ends 103' and 103 "(e.g., the intersection between ray 105 and containment volume 106).

Finally (e.g., at step 37 in fig. 3 or 353 or 362 in fig. 35), the similarity measure may be computed only within at least one limited range or interval of acceptable candidate spatial positions (line segments 93 a' and 103a) in order to locate (e.g., by retrieving the depth of a particular pixel) an object element imaged by the pixel-specific pixel and/or to determine whether the imaged object element belongs to the first object 91 or the second object 101.

Note that, a priori, the automated system (e.g., block 362) does not know the location of the object elements actually imaged with ray 2 (105): this is defined by the spatial configuration of the objects 91 and 101 (in particular by the extension of the objects in the dimension into the paper in fig. 10). Regardless, the correct element will be retrieved by using the similarity measure (e.g., as shown in FIG. 2 above).

Summarizing this procedure, it can be stated that the range or interval of candidate spatial positions can be limited to:

at least one first limited range or interval of acceptable candidate spatial positions associated with a first determined object (e.g., 91) (e.g., line segment 93a of ray 1; line segment 93 a' of ray 2); and

at least one second limited range or interval of acceptable candidate spatial positions associated with a second determined object (e.g., 101) (e.g., null for ray 1; line segment 103a for ray 2),

wherein limiting comprises defining at least one containing volume (e.g., 96) as a first containing volume surrounding a first determined object (91) (and a second containing volume (e.g., 106) surrounding a second determined object (e.g., 106)) to limit at least one first (and/or second) range or interval (e.g., 93 a', 103a) of candidate spatial locations; and

It is determined whether a particular 2D representation element is associated with a first determined object (e.g., 91) or a second determined object (e.g., 96).

The determination may be based on, for example, a similarity metric, such as discussed with reference to fig. 2 and/or other techniques (e.g., see below).

Additionally or alternatively, in some examples, other observations may also be utilized. For example, in the case of ray 1(95), based on the observation that the intersection between ray 1(95) and the second containing volume 106 is empty, it can be concluded that ray 1(95) belongs to only the first object 91. Notably, the final conclusion does not require strict consideration of the similarity measure and therefore can be performed in, for example, the constraint step 352.

In other words, fig. 10 shows a real object (cylinder) 91 described by a hexagonal approximation surface 92. However, for the camera view under consideration, the cylindrical object 92 is occluded by another object 101 drawn as a diamond.

When considering only the approximate surface 1(92), this has the following consequences: while the pixel associated with ray 1(95) will be correctly identified as belonging to the approximate surface 92, the pixel associated with ray 2(105) will not be correctly identified as belonging to the approximate surface 92, and therefore, when no appropriate countermeasures are taken, artifacts in the depth map will result.

One might think of one way to solve this situation is to create another approximation surface of the diamond. But ultimately this results in the need to manually create an accurate 3D reconstruction for all objects in the scene, which should be avoided because it is too time consuming.

To address this problem, as noted above, our technique introduces so-called containment volumes 96 and/or 106 in addition to the above-mentioned approximation surfaces. The containment volume 96 or 106 may also be rendered in the form of a grid or any other geometric primitive.

The containment volume may be defined as a coarse shell around the object and indicates the likelihood that a 3D point (object element) may be located in the containment volume (e.g., 96, 106). This does not mean that 3D points (object elements) must be present in such a containing volume. It is only possible (the decision to generate the containment volume may be left to the user). To avoid ambiguity of different camera positions (not shown in fig. 10), the containment volume may be a closed volume having an outer surface but no boundary (no outer edge). The enclosed volume must remain dry on one side of each surface when placed under water.

Thus, using an approximation surface (e.g., 92), a containing volume (e.g., 96, 106), or a combination thereof, allows for defining a more complex depth map range consisting of a set of potentially separable depth map intervals (e.g., 93a' and 103 a). Thus, automatic depth map processing (e.g., at retrieval steps 353 or 37 or block 363) may still benefit from reduced search space while correctly processing occluding objects.

The user is not required to render the containment volume around each object in the scene. If the depth can be reliably estimated for an occluding object, the containing volume does not have to be rendered. More details can be seen in section 10.

Referring to fig. 10, there may be a modification according to which the approximate surface 92 is not defined for the object 91. In that case, for ray 2(105), a limited range or interval of acceptable candidate spatial locations would be formed by:

a line segment (instead of line segment 93a') having ray 105 and the two intersections of the contained volume 96 as end points; and

-a line segment 103 a.

For ray 1(95), a limited range or interval of acceptable candidate spatial locations would be formed by the line segment having ray 1(95) and the two intersections containing volume 96 as endpoints (instead of line segment 93 a).

There may be a variant according to which a first approximation surface is defined for the object 91 and a second approximation surface is defined for the object 101. In that case, for ray 2(105), a limited range or interval of acceptable candidate spatial locations would be formed by a small line segment between the containment volume 106 and the second approximation surface associated with the object 101. This is because, after finding the intersection of ray 2(105) with the approximate surface of object 101, there is no need to make a metric comparison for object 91 occluded by object 101.

9.3 use containment volumes to resolve competitive constraints

The approximation surfaces often compete with each other, leading to potentially erroneous results. To this end, this section sets forth in detail the concept of how to avoid this situation. Fig. 11 shows an example. Here, the first real object 91 may be in the foreground and the second real object 111 may be in the background with respect to the camera 1 (94). The first object may be surrounded by the containing volume 96 and may contain an approximate surface 1(92) therein. The second object 111 (not shown) may comprise an approximate surface 2 (114).

Here, the pixels (or other 2D representation elements) associated with ray 1(95) may be processed as in FIG. 9.

The pixel (or other 2D representation element) associated with ray 2(115) may suffer from the problems discussed above with respect to ray 45a (section 8) of fig. 5. The real object element 93 is imaged, but its position is not easily recognized in principle. However, the following technique may be used.

First (e.g., at 351), a range or interval of candidate spatial locations is derived: the range may be expanded along ray 2(115) and thus contain a large number of candidate locations.

Further (e.g., at 35 and 36 and/or at 352 and/or at 362), the range or interval of candidate spatial locations may be limited to a limited range or interval of acceptable candidate spatial locations. Here, the limited range or interval of acceptable candidate spatial locations may include:

A first limited range or interval of acceptable candidate spatial positions formed by a line segment 96a between ends 96' and 96 "(e.g., including the intersection between volume 96 and ray 115);

a second limited range or interval of acceptable candidate spatial locations formed by points or line segments 112' (e.g., the intersection between ray 105 and approximation surface 112). Alternatively, the approximation surface 112 may be surrounded by another containing volume to result in a complete second depth line segment.

(the restricted range 112 'may be defined as a single point or tolerance interval, depending on the particular example or user's selection.

Finally (e.g., at step 37 or 353 or 363), a similarity measure may be calculated, but only within the line segment 96a and the point or line segment 112', in order to locate the object element imaged by the particular pixel and/or to determine whether the imaged object element belongs to the first object 91 or the second object 111 (thus excluding some locations identified as unacceptable). By limiting the range or interval of candidate spatial positions to a limited range or interval of acceptable candidate spatial positions (here formed by two different limited ranges, namely line segment 96a and point or line segment 112'), the location of the imaged object element 93 can be manipulated extremely easily and with less computational power.

In general, the range or interval of candidate spatial locations may be limited to:

at least one first limited range or interval of acceptable candidate spatial positions (e.g., 96a) associated with a first determined object (e.g., 91); and

at least one second limited range or interval of acceptable candidate spatial positions (e.g., 112') associated with a second determined object (e.g., 111),

wherein limiting (e.g., 352) comprises defining at least one containing volume (e.g., 96) as a first containing volume surrounding a first determined object (e.g., 91) to limit at least one first range or interval (e.g., 96a and 103a) of candidate spatial locations; and

wherein retrieving (e.g., 353) comprises determining whether the particular 2D representation element is associated with the first determined object (e.g., 91) or the second determined object (e.g., 101, 111).

(the first and second limited ranges or intervals of acceptable candidate spatial positions may be different and/or unconnected to each other).

The positioning information (e.g., depth information) acquired from the camera 2(114) may be obtained, for example, to confirm that the actually imaged element is the object element 93. Notably, the information obtained from the camera 2(114) may be or include previously obtained information (e.g., obtained from the retrieval step (37, 353) performed for the object element 93 at a previous iteration). The information from camera 2(114) may also utilize a predefined positional relationship between camera 1 and camera 2(94 and 114).

Based on the information from the camera 2(114), the 2D representation element 93 within the line segment 96a and/or the 2D representation element 93 is associated with the object 91 may be located for the 2D image obtained from the camera 1 (94). Thus, when comparing similarity measures in 2D images acquired from cameras 1(94), the location 112 "will be excluded a priori, saving computational cost.

In other words, we can create even stronger constraints when we take advantage of the multi-camera nature of our problem. For this purpose, it is assumed that an object (e.g., a sphere) 91 is photographed by the second camera 2 (114). An advantage of camera 2(114) is that ray 114b for object point 93 represented by ray 2(115) of camera 1(94) only strikes the approximation surface 92 (and not the approximation surface 112). The depth of camera 2(114) may be calculated and then it is deduced that it is feasible for ray 2(94) to represent only the depth of object 91. The latter may be processed, for example, at block 383 and/or 387.

Using this procedure, during subsequent analysis of the similarity measure (e.g., steps 37 or 353), the calculation 112' of the similarity measure when analyzing the first 2D image from camera 1(94) may be avoided, thereby reducing the necessary calculations.

In general, a camera consistency operation may be performed.

If the cameras 94 and 114 operate in unison (e.g., they output compatible positions), then the positional locations of the object elements 93 provided by the two iterations of the method are coherent and the same (e.g., within a predetermined tolerance).

If cameras 94 and 114 are operating in unison (e.g., they output incompatible positions), the positions may be invalidated (e.g., by invalidator 383). In an example, the accuracy may be increased and/or the tolerance may be decreased and/or the user may be required to increase the number of constraints (e.g., increase the number of occlusion volumes, exclusion volumes, approximation surfaces) and/or increase the accuracy (e.g., by decreasing the tolerance or by more accurately redrawing the occlusion volumes, exclusion volumes, approximation surfaces).

However, in some cases, the correct position of the imaged object element 114 may be automatically inferred. For example, in FIG. 11, if an object element 93 imaged by camera 1(94) through ray 1(115) is incorrectly retrieved (e.g., at 37 or 353) in range 112', and the same object element 93 imaged by camera 2(114) through ray 114b is correctly retrieved (e.g., at 37 or 353) in interval 96b by camera 2(114), the position provided for camera 2(114) may be validated due to the fact that a smaller number (one: 96b) of restricted ranges or intervals of acceptable candidate positions has been found relative to the number (two: 112' and 96a) of restricted ranges or intervals of acceptable candidate positions obtained by camera 1 (94). Thus, assume that the position provided for camera 2 is more accurate than the position provided for camera 1 (94).

Alternatively, the correct position of the imaged object element 93 may be automatically inferred by analyzing the confidence value calculated for each position. The estimated location with the highest confidence value may be the location selected as the final location (e.g., 387').

FIG. 46 shows a method 460 that may be used. In step 461, a first positioning may be performed (e.g., using camera 1 (94)). At step 462, a second positioning may be performed (e.g., using camera 2 (114)). In step 463 it is checked whether the first positioning and the second positioning provide the same or at least a consistent result. In the event of identical or matching results, the positioning is validated at step 464. In the case of inconsistent results, according to a specific example, the positioning may be:

invalid, thereby outputting an error message; and/or

Invalid, thus restarting a new iteration; and/or

Analyzing, thereby selecting one of the two locations based on their confidence and/or reliability.

According to one example, the confidence or reliability can be based at least on one of:

the distance between the positioning position and the camera position is increased as the distance is closer;

a number of objects or restricted ranges containing volumes or acceptable spatial positions, such that if a smaller number of objects or restricted ranges containing volumes or acceptable spatial positions are found within a range or interval of acceptable spatial candidate positions, the confidence value is increased;

A measure of confidence value, etc.

9.4 correct treatment of planar approximation surfaces

To date, most approximate surfaces contain a volume. This is often very beneficial in case the camera has an arbitrary viewing position and for example surrounds an object. The reason is that wherever the camera is located, an approximate surface around the volume will always result in a correct depth indication. However, closed approximation surfaces are more difficult to render than planar approximation surfaces.

Fig. 12 depicts an example of an approximate surface of a plane. The scene is composed of two cylindrical objects 121 and 121 a. The object 121 is approximated by an approximation surface 122 surrounding the volume. The cylinder 121a is approximated on only one side, using a simple plane as the approximation surface 122 a. In other words, the approximation surface 122a does not constitute a closed volume and is therefore referred to as a planar approximation surface hereinafter.

If we use the closed volume proposed so far, like 122, for the approximation surface, this will give a correct indication of the estimated depth for all possible camera positions. To achieve the same effect for the planar approximation surface 122a, we may need to extend the claimed method slightly, as described below.

The explanation is based on fig. 12 a. For example, a point 125 approximating the surface 122a as seen by the camera 124 is proximate to the real object element 125 a. Thus, the constraints of the approximated surface 122a will result in a correct candidate spatial position of the object element 125a relative to the camera 124. However, this is not the case for the object element 126 when considering the camera 124 a. In fact, ray 127 intersects the planar approximation surface at point 126 a. In accordance with the concepts described above, point 126a may be considered to be an approximate candidate spatial location for object element 126 captured by ray 127 (or at least, by limiting the range or interval of candidate spatial locations to a limited range or interval of acceptable candidate spatial locations consisting of an interval between point 126a and the location of camera 124a, when the metric is subsequently analyzed (e.g., at step 37 or 353 or block 363), there is a risk of drawing a false conclusion that point 126 is located at a different, incorrect location). But this conclusion is in principle not advisable because the point 126a is far from the object element 126, and therefore the approximate surface 122a is likely to lead to a false constraint on the acceptable candidate spatial position of the point 126. A potential reason is that a planar approximation surface may represent primarily a "one-sided" constraint of an object. Thus, we can distinguish whether a ray (or other range of candidate spatial locations) strikes the approximation surface on the most preferred side. Fortunately, this can be done by fitting normal vectors To the approaching object surface (e.g., the top of the surface of cylinder 121 a), thus achieving a simple but effective technique. Then, in an example, the depth approximation is preferably only considered when the dot product between the normal and the ray is negative (see section 10).

By these definitions, both closed approximation surfaces and planar approximation surfaces can be supported. While the first is more general, the latter is easier to draw.

10. Generation and/or refinement of depth maps (localization) based on approximated surfaces and included volumes

10.1 example of the procedure

Reference may now be made to fig. 47, which illustrates a method 470, which method 470 may be, for example, an operational scenario of the example method 30 or 350 or the system 360. It can be seen that the derivation 351 and retrieval 353 can be as in any of the above and/or below examples. Referring to step 472, for example, it may implement step 352 or may be operated on by block 362.

Approximating surfaces and containing volumes, as well as other types of constraints, may be used to limit the possible depth values (or other forms of locating imaged object elements). In these ways, they disambiguate the determination of correspondence, thereby improving the quality of the results.

The acceptable depth range for a pixel (or more generally, an object element associated with a 2D representation element of a 2D image) may be determined by all approximation surfaces and all encompassing volumes (or more generally, ranges or intervals of candidate spatial locations) that intersect with a ray associated with the pixel. Depending on the scene and the approximate surface drawn by the user (step 352a), the following may occur:

1. the pixel ray does not intersect any approximate surface or any containing volume (i.e., a limited range of acceptable candidate spatial locations would result in null). In this case (step 352a1), we consider all possible depth values for the pixel, and there is no additional constraint on the depth estimation. In other words, we set the limited range of acceptable candidate spatial locations (362') equal to the range of acceptable candidate spatial locations (361').

2. The pixel ray intersects only the containing volume but not the approximation surface (i.e., the limited range of acceptable candidate spatial locations is limited only by the containing volume and not by the approximation surface). In this case (352a2), it is conceivable to allow only the corresponding object to be located within one of the containing volumes. However, this is critical, as the contained volumes will typically exceed the object boundaries they describe, to avoid the need to specify the shape of the object accurately. Therefore, this approach is fault-tolerant (fail-safe) only if all 3D objects are surrounded by some containing volume. In all other cases, it is better not to impose any constraints on the depth estimation when the approximation surface is not involved. This means that we consider all possible depth values for this pixel and there is no additional constraint on the depth estimation. In other words, we set the limited range of acceptable candidate spatial locations (362') equal to the range of acceptable candidate spatial locations (361').

3. The pixel ray intersects the approximation surface and one or more containing volumes (thus, the limited range of acceptable candidate spatial locations may be limited by the containing volumes as well as the approximation surface). In this case (352a3), the acceptable depth range (which is a suitable subset of the original range) may be constrained as described below.

4. Section 10.2 will discuss a fourth possibility (but in any case resulting in 352a3) for a ray to intersect only the approximation surface.

For each pixel affected by the approximated surface, the system calculates the likely position of the 3D point (object element) in 3D space. To this end, please consider the scenario depicted in FIG. 13. The containment volumes may intersect each other in an example. They may even intersect the approximation surface. Furthermore, the approximation surface may, but need not, contain a volume. Fig. 13 shows a camera 134 acquiring 2D images. In FIG. 13, ray 135 is depicted. Contained volumes 136a-136f are depicted (e.g., previously defined, e.g., manually defined by a user, e.g., by using constraint definer 384). There is also an approximation surface 132 contained within the containment volume 136 d. Approximating the surface and containing the volume provides constraints for limiting the range or interval of candidate spatial locations. For example, point 135 '(in ray 135) is outside of a limited range or interval of acceptable candidate spatial locations 137 because point 135' is not near an approximate surface or within any containing volume or 353 or block 363: thus, when comparing similarity measures (e.g., at step 37), point 135' will not be analyzed. Conversely, point 135 "is within a limited range or interval of acceptable candidate spatial locations 137, since point 135" is within the contained volume 136 c: point 135 "will be analyzed when comparing similarity measures.

Basically, when constrained (e.g., steps 35, 36, 352), the path may be scanned from a proximal position corresponding to the camera 134 to a distal position (e.g., corresponding to the path of the ray 135). Intersections with inclusion volumes and/or exclusion volumes (e.g., I3-I6, I9-I13) may define the end of a range or interval of acceptable candidate positions. According to an example, when an approximate surface is encountered (e.g., corresponding to point I7), the limiting step may be stopped: thus, even if there are additional inclusion volumes 136e, 136f beyond the approximation surface 132, the additional inclusion volumes 136e, 136f are excluded. This is due to the consideration that the camera cannot image anything further than point I7: the locations associated with points I8-I13 are visually covered by objects associated with the approximation surface 132 and cannot be physically imaged.

Additionally or alternatively, negative positions (e.g., points I0, I1, I2) that could theoretically be associated with the ray 135 may also be excluded because they cannot be physically acquired by the camera 134.

To define a procedure for calculating an acceptable range of depth values, at least some of the following assumptions may be applied:

each approximation surface is surrounded by an inclusion volume. An acceptable depth range associated with the approximation surface is defined by the space between the approximation surface and the containing volume. Section 10.2 will describe the method in the case where the approximation surface is not surrounded by an encompassing volume.

Let N be the number of intersections found with any approximation surface and any containment volume.

The intersections are sorted by their depth to the camera, starting first with the most negative depth value.

Let r be the vector describing the ray direction of the camera entrance pupil for the pixel under consideration.

Let depth _ min be a global parameter that defines the minimum depth an object can have with respect to the camera.

Let depth _ max be a global parameter that defines the maximum depth that an object can have with respect to the camera (which can be infinite)

The normal vector containing the volume points to the outside of the volume.

For a planar approximation surface, the normal vector points in the direction that it can provide the correct depth range constraint (see section 9.4).

The following procedure may then be used to compute a set of depth candidates for each pixel.

The procedure can be understood as finding the closest approximate surface for a given camera, since this defines the maximum possible distance of the 3D points. In addition, it traverses all contained volumes to determine the possible depth range. If there is no contained volume around the approximation surface, the depth range may be calculated in some other way (see, e.g., section 10.2).

In these ways we therefore reduce the search space for depth estimation (a limited range or interval of acceptable spatial positions), which reduces ambiguity and thus improves the quality of the computed depth. The actual depth search may be performed in different ways (e.g., at step 37 or 353). Examples include the following methods:

Selecting a depth that shows the minimum matching cost within the allowed depth range

Before computing the minimum, the matching cost is weighted by the difference in depth of intersection of the pixel ray with the approximation surface.

To be more robust to occlusions and not require that all objects be enclosed by the containment volume, the user may provide an additional threshold value C0 (see also section 16.1), which may be analyzed, for example, at block 383 and/or 387. In this case, when the confidence of all the depths available before starting the interactive depth map refinement is greater than the provided threshold, these depths are preserved and are not modified according to the approximation surface. This approach requires that the depth map program outputs for each pixel a so-called confidence [17], which defines how well the program determines the selected depth value. A low confidence value means that the probability of a wrong depth value is large and therefore the depth value is not reliable. In these ways, occlusions that can easily estimate depth can be automatically detected. Therefore, they do not require explicit modeling.

Note 1: instead of requesting the normal of the containing volume pointing to the outside (or inside), we can also count how long the surfaces of the containing volume are intersected. For the first intersection point we enter the volume, for the second intersection point we leave the volume, for the third intersection point we re-enter the volume, and so on.

Note 2: if the user decides not to place an approximate surface within the object, the search ranges in lines (25) and (35) may be expanded as described in the following section.

10.2 deriving an acceptable depth range for an approximate surface not surrounded by an encompassing volume

The program defined in 10.1 can be used to calculate an acceptable depth range of an approximation surface from the contained volume surrounding the approximation surface. However, this is not a mandatory constraint. Conversely, an acceptable depth range may also be calculated by other means included in the getLastDepth () function in section 10.1. In the following, we exemplarily define such a function that can be implemented by getLastDepth ().

For this reason, it can be assumed that the normal vector approximating the surface points to the outside of the volume. This may be ensured manually by the user or an automatic procedure may be applied in case the approximation surface represents a closed volume. In addition, the user may define a tolerance value t₀The tolerance value may essentially define the possible distance of an object 3D point (object element) from an approximate surface in 3D space.

Fig. 14 shows a case where an object element 143 of a real object 141 is imaged by a camera 1(144) through a ray 2(145) to generate a pixel in a 2D image. To roughly describe the real object 141, the approximation surface 142 may be defined (e.g., by a user) so as to be located inside the real object 141. The position of object element 143 will be determined. The intersection between ray 2(145) and approximation surface 142 is represented by 143 "' and may be used, for example, as the first end of a limited range or interval of acceptable candidate locations for retrieving the location of imaged location object element 143 (which will then be retrieved using the similarity metric, e.g., at step 353 or 37). A second end of the restricted range or interval of acceptable candidate locations will be found. However, in this case, the user does not select (or create or define) the containment volume.

Although other constraints may be found to limit the range or interval of acceptable candidate locations. This possibility can be embodied by a technique that takes into account the normal of the approximation surface 142 at the intersection point 143' ″Will be normal toTaking into account a value (e.g. by tolerance value t)₀Scaling) may be used to determine a limited range or interval of acceptable candidate locations as an interval 147 between point 143 "'(approximating the intersection between surface 142 and ray 2(145) and point 147'. In other words, to determine which pixel (or other 2D representation element) is shadowed by the approximation surface 142In response, ray 2(145) (or another range or interval of candidate locations) defined by the nodes of the pixels and camera 1(144) may be made to intersect the approximation surface 142. For each pixel whose associated ray intersects the approximation surface, a likely position of the 3D point in 3D space may be calculated. These possible positions depend on the tolerance values (e.g. provided by the user).

Let t₀As a tolerance value (e.g., provided by a user). It may specify a maximum acceptable distance of the 3D point to be found (e.g., point 143) from plane 143b, plane 143b being defined by intersection 143 "'of pixel ray 145 with approximation surface 142 and normal vector of approximation surface 142 at intersection 143"' And (4) defining. Since by definition the approximation surface 142 never exceeds the real object 141 and since the normal vector of the approximation surface 142Assumed to be pointing outside of the 3D object volume, a single positive number may be sufficient to specify the tolerance t of the 3D point location 143 relative to the approximation surface 142₀. In principle, in some examples, the user may provide a secondary negative value to indicate that the real object point 143 may also be located behind the approximation surface 142. In these ways, placement errors of the approximation surface in 3D space can be compensated.

Is the normal vector of the approximation surface 142 at the point where the pixel ray 1(145) intersects the approximation surface 142. In addition, let vectorThe optical axis of the camera 144 under consideration is defined, pointing from the camera 144 to the scene (object 141). The tolerance value t provided by the user can then be calculated₀Conversion to depth tolerance for a given camera:

is a general vectorIs the norm (or length) of, g is the absolute value of the general scalar g, andis a vectorAndscalar product in between (the modulo of the scalar product may be). Parameter phi_maxMay be (e.g., defined by a user) inAndthe larger the angle between (optionally) allows to limit the possible 3D point positions.

Let D be the depth of the intersection 143' "of the ray 145 with the approximation surface 142 relative to the camera coordinate system (e.g., depth D is the length of the line segment 149). Then the allowed depth value is

[D-Δd,D] (1)

D- Δ D may be the value returned by the function getLastDepth () in section 10.1. Thus, Δ d may be the length of the limited range of acceptable candidate locations 147 (between points 143 '"and 147').

For phi_max＝0，Δd＝t₀. This procedure can also be repeated iteratively. At the first timeAt iteration, tolerance t₀Is selected as the first highest value. The process then executes at steps 35 and 36 or 352, and then at step 37 or 353. Subsequently, a positioning error is calculated for at least some of the anchor points (e.g., by block 383). If the positioning error exceeds a predetermined threshold, a new iteration may be performed, wherein a lower tolerance t is selected₀. This process may be repeated to minimize positioning errors.

The interval [ D- Δ D, D ] (denoted by 147 in fig. 14) may represent a limited range of acceptable candidate positions in which object element 143 will actually be located using the similarity measure.

In general, a method for locating an object element 143 associated with a particular 2D representation element in a 2D image of a space in the space containing at least one determined object 141 is defined, the method comprising:

deriving a range or interval (e.g., 145) of candidate spatial locations for the imaged spatial elements based on a predefined positional relationship;

Limiting the range or interval of candidate spatial locations to at least one limited range or interval of acceptable candidate spatial locations (147), wherein limiting comprises:

defining at least one approximation surface (142) and one tolerance interval (147) to limit at least one range or interval of candidate spatial positions to a limited range or interval of candidate spatial positions defined by the tolerance interval (147), (wherein, in an example, the at least one approximation surface may be contained within the determined object), wherein the tolerance interval (147) has:

a distal end (143' ") defined by at least one approximation surface (142); and

a proximal end (147') defined based on a tolerance interval; and

based on the similarity measure, the most suitable candidate spatial location (143) is retrieved among the acceptable candidate spatial locations of the limited range or interval (147).

Note that: if the user decides not to place an approximation surface within the object, the depth range allowed in equation (1) can be extended as follows:

[D-Δd,D+Δd₂] (2)

Δd₂calculated in the same way as Δ d, while the parameter t₀Is given by the second parameter t₁And (6) replacing. Then, the value Δ d₂It can also be used for the procedure described in section 10.1, lines (25) and (35). In other words, row (25) is replaced with:

DepthSet＝DepthSet+[max(lastDepth,depth_min)),depth+Δd₂],

And row (35) is replaced with:

DepthSet＝DepthSet+[max(getLastDepth(),min_depth),depth+Δd₂]

10.3 user-assisted depth estimation

In the following, we will describe more detailed examples of how interactive depth map estimation and refinement can be performed:

1. identifying erroneous depth regions of a pre-computed depth map (34) (see section 14)

2. Creating an approximate surface for regions where depth values are difficult to estimate

3. Manually or automatically creating the containment volume (preferred), or thresholding t₀Is set to infinity

4. A depth map confidence (reliability) threshold C0 is set and all depth values of the pre-computed depth map (34) having confidence values less than C0 and affected by the approximation surface (the intersection of the ray with the approximation surface) are eliminated. The threshold C0 is set in such a manner: all false depth values for all areas covered by the approximation surface disappear. It is noted that C0 may be defined differently for each approximation surface (see, e.g., section 16.1).

5. For all approximated surfaces not surrounded by the contained volume, the threshold t is lowered₀(which may be defined for each approximation surface) until all depth values covered by the approximation surface are correct. If this is not possible, the approximation surface is refined.

6. For all approximation surfaces that have been surrounded by the containing volume, the still wrong depth values are identified and the approximation surfaces and the containing volume are refined appropriately.

7. Identifying false depth values in an occlusion

8. Containment volumes are created for them. If this is not sufficient, additional approximation surfaces are created for them.

9. If this results in a new depth error in the 3D object covered by the approximation surface, the containment volume is refined for the occlusion to a precise approximation surface.

10.4 Multi-Camera consistency

The first few sections discuss how to limit the acceptable depth values for a certain pixel of a certain camera view. To this end, ray intersections (e.g., 93 "', 96" ', 103 ", 143" ', I0-I13, etc.) determine the range associated with the considered pixel of the considered camera. Although this approach may already significantly reduce the number of acceptable depth values, thereby improving the quality of the resulting depth map, it may be further improved in view of the fact that the correspondence determination not only relies on a single camera, but also on multiple cameras, and that the depth maps of the different cameras need to be consistent. In the following we list how this multi-camera analysis further improves the depth map computation.

10.4.1 Multi-Camera consistency based on Pre-computed depth values

Fig. 11 illustrates a case in which the pre-calculated depth of camera 2(114) may simplify the depth calculation of camera 1 (94). For this purpose, consider ray 2(115) in fig. 11. It may represent both object 91 and object 111 based only on the approximate surface and the contained volume.

Let us now assume that the depth value of ray 114b has been determined using one of the methods described above or below (or using another technique). For example, after step 37 or 353, the spatial location obtained for ray 114b is point 93. It has to be noted that the point 93 is not located on the approximation surface 92, but on the real object 91, and is therefore at a distance from the approximation surface 92.

In the next step we consider ray 2(115) of camera 1(94) and aim to calculate its depth as well. Without knowledge of the depth computed for ray 114b, there are two acceptable limited ranges of candidate spatial locations (112', 96 a). Let us assume that the automatic determination (e.g., at 37, 353, 363 … …) fails to deduce the real object (91) that the outgoing lines 2(115) intersect, and that a point in the range 112' is erroneously obtained as the most suitable spatial location. However, this means that the pixel or 2D representation element associated with ray 2(115) has been assigned two different spatial positions, namely 93 (from ray 114b) and one of 112 '(from ray 115), because the positions in object element 93 (from ray 114b) and 112' (from ray 115) are meant to be imaged by the same 2D representation element (incompatible position). However, it is not possible for one 2D representation element to be associated with a different spatial position. Thus, a failure (invalidation) of the depth calculation may be automatically understood (e.g., by invalidator 383).

To address this situation, different techniques may be employed. One technique may include selecting the closest spatial location 93 for ray 2 (115). In other words, the automatically computed depth values for ray 2(115) for camera 1(94) may be overwritten with the spatial location of ray 114b for camera 2 (114).

Instead of searching the entire depth range (115) when calculating the spatial position for ray 2(115), it is also possible to search only the range between camera 1(94) and point 93 when the spatial candidate position for ray 114b has been reliably determined before. In these ways, a significant amount of computing time can be saved.

Fig. 41 shows a slightly different situation. Assume that the depth for ray 114b has been calculated to match spatial candidate location 93. Let us now consider ray 3 (415). A priori, the situation is the same as in fig. 11. Based only on the contained volume 96 and the approximated surface 92, ray 3(415) may depict the real object 91, or may depict the object 111. Therefore, it is possible (at step 37 or 353) to erroneously position the pixel at spatial position a' (ray 3(415) does not intersect with real object 91 and thus depicts object 111).

Knowing (e.g., based on prior processing, e.g., based on method 30 or 350 performed on ray 114 and camera 2114) that ray 114b depicts object element 93 (and no element at spatial location a '), it may be automatically inferred that for ray 3(415), spatial location a ' is unacceptable because the assumed element in spatial location a ' will occlude object element 93 for camera 2. In other words, spatial position a 'may be excluded for ray 3(415) of camera 1(94) as long as camera 2(114b) sees a depth of spatial position a' that is less than the depth value of point 93 by at least some predefined threshold.

10.4.2 Multi-Camera based consistency based on acceptable spatial candidate positions

The previous section describes how knowledge of depth values computed from a limited range of spatial positions (e.g., 96b) reduces the range of acceptable candidate spatial positions for another camera (e.g., 94).

However, even without such knowledge of the calculated depth values, the mere presence of a containment volume for one camera application may limit the acceptable candidate spatial locations for another camera.

This is depicted in fig. 42. In some examples, it may be considered that there are two containing volumes and one approximate surface. Now consider that ray 425b from camera 2(424b) is directed toward the approximation surface 422. Assume that a reliable depth (or other type of location) for ray 425b has not been pre-computed. Then for each ray that intersects the approximation surface 422, acceptable spatial candidate locations are located within the intersecting containing volumes 426a and 426 b.

However, this means that no object can be located within the area indicated by the letters A, B and C. The reason is as follows:

if an object is present in one of regions A, B or C, the object will be visible in camera 2(424 b). At the same time, the object will be located outside the containing volume 1(426a) and the containing volume 2(426 b). By definition, this is not allowed because we point out that reliable depths cannot be calculated, and therefore all objects must be located within the containment volume. Thus, any object in regions A, B and C may be automatically excluded. This fact can then be used for any other camera, for example camera 1(424 a). Basically, an exclusion volume (in which unacceptable candidate positions are excluded from the limited range for the camera 1) is obtained from the areas A, B and C.

When an object is located in region D (behind containing volume 2(426b), even if in the foreground relative to containing volume 1(426 a)), the object does not cause any contradiction because the object is not visible in camera 2(424 b). Thus, there is a possibility that the user does not draw any object containing volume. A policy may be defined that allows the presence of objects from region D. In particular, a method may be performed comprising, upon defining a first proximal containing volume (426b) or approximation surface and a second distal containing volume or approximation surface (422, 426a), automatically defining:

a first excluding volume (C) between the first containing volume (426b) or the approximation surface and the position of the at least one camera (424 b); and

a second excluded volume (a, B) between the second containing volume (426a) or the approximation surface (422) and the position of the at least one camera (424B), wherein a non-excluded region (volume D) between the first excluded volume (C) and the second excluded volume (a, B) is excluded.

Basically, for camera 2(424b), volume D is occluded by occlusion volume 426b, which can actually hold objects, and thus is not in fact an exclusion volume. Region D may be formed by a position of a range or interval (e.g., ray) of candidate positions between containing volume 2(426b) and containing volume 1(426a) or the approximate surface 422 that are farther from camera 2(424b) than containing volume 2(426b), but closer to camera 2(424b) than containing volume 1(426a) or the approximate surface 422. The position of region D is thus an intermediate position between the constraints.

Fig. 45 may provide another example illustrating a method 450 that may be related to the example of fig. 42 in some cases.

Method 450 (which may be an example of one of methods 30 and 350) may include a first operation (451) in which position parameters associated with the second camera position (424b) are obtained (without strictly requiring that an image actually be acquired). We note that the containment volume (e.g., 426a, 426b) may have been defined (e.g., by the user).

It is then intended to perform a method (e.g. 30, 350) of acquiring a positioning of an object element of a first 2D image, e.g. the first 2D image is acquired by camera 1(424a), camera 1(424a) being in a predetermined positional relationship with camera 2(424 b). To this end, a second operation 452 may be used (e.g., which may be understood to implement the method 350). Thus, for example, the second operation 452 may include steps 353 and 353.

For each 2D representation element (e.g. pixel) of the first 2D image, a ray (range of candidate spatial positions) is defined. In fig. 42, rays 425c, 425D, 425e are shown, each ray being associated with a particular pixel (which may be identified with (x0, y0) each time) of the 2D image acquired by camera 1(424 a).

We see from fig. 42 that the locations in the line segment 425D' in the ray 425D (those locations within the volume A, B, C) (e.g., location 425D ") will occlude the contained volumes 426a and 426b in the second 2D image. Thus, even if the constraint is not predefined, it is still possible to exclude the positions in 425 d' from the limited range of acceptable spatial candidates.

We also see from fig. 42 that locations in line segment 425e 'in ray 425e (those locations within volume D) (e.g., location 425 e' ") will be occluded by containing volume 426b in the second 2D image. Thus, even if different constraints are predefined in the step (not shown in fig. 42), the position in 425 e' can still be included within the limited range of acceptable spatial candidates.

This may be obtained by a second operation 452 of the method 450. In step 453 (which may implement step 351), for the generic pixel (x0, y0), a corresponding ray (which may be associated with any of rays 425c-425 e) may be associated.

At step 454 (at least one sub-step of step 352 may be implemented), the ray is limited to a limited range or interval of acceptable candidate spatial locations.

Nonetheless, a technique is now being sought for further limiting the positions in the ray to subsequently process (at 37 or 353) only acceptable candidate positions. Thus, by considering the inclusion volumes 426a and 426b that have been provided (e.g., by a user) for the second camera position, spatial positions may be excluded from the limited range of acceptable spatial positions.

The loop between steps 456 and 459 (e.g., embodying step 352) may be iterated. Here, for each ray 425c-425e, the candidate spatial location (here denoted as depth d) is scanned from a proximal position to a distal position (e.g., infinity) of camera 1(424 a).

In step 456, the first candidate d in the acceptable range or interval of acceptable spatial candidate positions is selected.

At step 457a, based on the location parameters obtained at the first operation (451), it is analyzed whether a candidate spatial location D (e.g., associated to the candidate spatial location, such as location 425D in ray 425D ", or 425e in ray 425 e") is occluded by at least one containing volume (426a) in the second 2D image, and as such, to determine a possible occlusion (457 a'). For example, for camera 1(424b), location 425e '"is behind containing volume 426b (in other words, the ray emanating from camera 2(424b) intersects containing volume 426b before intersecting location 425 e'", or, for the ray emanating from camera 2(424b) and associated with location 425e '", containing volume 426b is between cameras 2(424b) and 425 e'"). In this case (transition 457a '), at step 458, the location, such as location 425e ' "(or associated depth d) is kept within a limited range of acceptable candidate spatial locations (thus, the similarity metric will actually be evaluated for location 425e '" at retrieval step 353 or 459 d).

If, at step 457a, it is identified that at least one candidate spatial position D is unlikely to be occluded by at least one containing volume based on the positional parameters obtained at the first operation (451) (transition 457a "), then it is analyzed (at step 457b) based on the positional parameters obtained at the first operation (451) whether the at least one candidate spatial position (depth D) would occlude at least one containing volume in the second 2D image (426 b). This may occur at candidate spatial location 425d ", which candidate spatial location 425 d" (in the case of subsequent identification, as the most preferred candidate spatial location at step 37 or 353) would occlude the containing volume 426b (in other words, the ray that emanated from camera 2(424b) intersects the containing volume 426b after intersecting location 425d ", or, for the ray that emanated from camera 2(424b) and is associated with location 425 d", candidate location 425d "is between the location of camera 2(424b) and containing volume 426 b). In this case, through transition 457 b' and step 457c, the candidate location (e.g., 425d ") is excluded from the limited range of acceptable spatial locations, and then (at step 353 or 459 d) the similarity metric is not evaluated.

At step 459, the new candidate location is updated (e.g., a new d is selected that is more distal relative to the previous location, even if close to the previous location), and a new iteration begins.

When no possible candidate position within the limited range is identified (at 459) (e.g. d reaches approximately "infinity" or reaches a maximum threshold of approximate surface), then a final position fix is performed at 458d (which may be understood as embodying step 353). Here, only those candidate locations are considered, and similarity measures have been measured for the selected locations in 458.

10.4.3 procedure for Multi-Camera consistency aware depth calculation

The following procedure describes in more detail how multi-camera consistency is used when calculating the depth values for a given camera. In the following, without loss of generality, the camera is referred to as "camera 1".

To be able to handle all the concepts set forth in sections 10.4.1 and 10.4.2, the procedure iterates through all depth candidates d in increasing order. In these ways, as soon as the first case described in section 10.4.1 occurs, the acceptance of more depth candidates may be stopped, wherein objects that have been located may occlude the object of the camera 1(94) under consideration.

Next, the program may essentially convert the candidate spatial position defined by the depth candidate d for camera 1(94, 424a) to a depth candidate d' for the second camera c. Furthermore, it calculates pixel coordinates (x ', y') in which the candidate spatial position is visible in the second camera c (camera c may be a camera acquiring a 2D image on which localization has been performed; examples of camera c may be, for example, camera 2(114) in FIG. 11 and camera 2(424b) in FIG. 42).

Then, based on the available pixel coordinates (x ', y ') for camera c, it is checked whether a depth candidate D ' has been previously calculated for camera c (424b,114) and pixel (x ', y '), which is reliable enough to influence the depth calculation for camera 1(94,424 a). Different heuristics may be employed for this purpose. In the simplest case, a depth candidate D' may be accepted only when its confidence or reliability value is sufficiently large, or when its unreliability value is sufficiently small. However, it is also possible to check how many fewer depth candidates are to be checked for the pixel (x ', y') in the camera c than for the pixel (x0, y0) in the camera 1(94,424 a). In these ways, the depth candidate D ' for (x ', y ') can be considered more reliable than the depth candidate D for (x0, y0) in the camera 1(94,424 a). Thus, the program may allow the depth candidate D ' for pixel (x ', y ') to influence the depth candidate selection for pixel (x0, y 0).

Then, based on this decision, whether or not there is a reliable depth candidate D', the program then considers the scenarios discussed in section 10.4.1 or section 10.4.2. Lines (28) - (34) consider the case where a new object for camera 1(94,424a), defined by pixel (x0, y0) and depth D, will occlude an existing object for camera c (424b,114), defined by (x ', y ') and depth D '. Since this is not allowed, the depth candidates are rejected. On the other hand, lines (42) - (50) consider a case where a new object in camera 1, defined by pixel (x0, y0) and depth d, will be occluded by a known object in camera 2(424b, 114). This is also not allowed and therefore all larger depth candidates are rejected.

Line 55ff is finally related to section 10.4.2.

It is noted that in the above procedure, the retrieval and restriction are performed in an interleaved manner. Specifically, after the first limiting step in lines (3) - (5), the remaining lines perform additional limiting in lines 34, 49, 67 and calculate the similarity measure in lines 40, 61, 71. It must be understood that this does not affect the claimed subject matter. In other words, whether the constraints and interleaving are in order, interleaved, or even computed in parallel, will result in the same result, subject to the claimed subject matter.

However, for the sake of clarity, the following algorithm shows the same concept of performing only additional limiting steps.

Fig. 43 provides another example (see also fig. 11 and 41). FIG. 43 shows a method 430, comprising:

as a first operation (431), a plurality of 2D representation elements for the second 2D image (e.g., acquired by camera 2(114) in fig. 11 or fig. 41, camera c.... in the code above) are positioned (e.g., using any method, including method 30 or 350),

as a second subsequent operation (432):

for a first 2D image (e.g., acquired by camera 1(94) in FIGS. 11 and 41 or "camera 1" in the above code), performing derivation steps (351, 433) and limiting steps (352, 434) for 2D representation elements (e.g., pixels associated with ray 115 or 415) to obtain at least one limited range or interval of acceptable candidate spatial locations (FIG. 11: line segments 96a and 112'; FIG. 41: line segment with ray 3(415) intersection with the contained volume 96 as the end);

Finding (352,435) an element (x ', y ' in the code) in the previously located 2D representation element of the second 2D image that corresponds to the candidate spatial position (e.g., (x0, y0 in the code, and positions within the restricted range 96a, 112 ', etc.) of the first determined 2D representation element;

further limiting (352, 436) the limited range or interval of acceptable candidate spatial locations (in FIG. 11: by excluding line segment 112 'and/or by stopping at location 93; in FIG. 41: by further excluding location A' from the limited range or interval of acceptable candidate spatial locations);

the most suitable candidate spatial position for the determined 2D representation element (e.g., (x0, y0) in the code) of the first determined 2D image is retrieved (353, 437) within a further limited range or interval of acceptable candidate spatial positions.

Referring to the example of fig. 11: after having performed as a first operation (431) the positioning of the second 2D image acquired by the camera 2(114) and having retrieved the correct position of the object element 93 (associated with the particular pixel (x ', y') and ray 114 b) etc., it is now time to perform a second operation (432) for positioning the position associated with the pixel of the first determined 2D image acquired by the camera 1 (94). Inspection ray 2(115) (associated with pixel (x0, y 0)). First, the derivation steps (351, 433) and the limitation steps (352, 434) are performed to obtain a limited range or interval of candidate spatial locations formed by the line segments 96a and 112'. A plurality of depths d are scanned (e.g., from the proximal end 96 'to the distal end 96b for subsequent scanning of the span 112'). However, when the position of the object element 93 is reached, a search is made in step 435 whether a pixel from the second 2D image is associated with the position of the element 93. The pixel (x ', y') from the second 2D image (acquired from camera 2 (114)) is found to correspond to the position of the object 93. Since pixel (x ', y') is found to correspond to location 93 (e.g., within a predetermined tolerance threshold), it is concluded that pixel (x0, y0) of the first image (camera 1(94)) is also associated with the same location (93). Thus, at 436, the limited range or interval of acceptable spatial positions is effectively further limited (because, relative to camera 1(94), positions further distal than the position of object element 93 are excluded from the limited range or interval of acceptable spatial positions), and at 437, the position of object element 93 may be associated with a pixel (x0, y0) of the second image after the respective retrieval step.

Referring to the example of fig. 41: after having performed the positioning of the second 2D image acquired by the camera 2(114) as the first operation (431) and having retrieved the correct position of the object element 93 (associated with the particular pixel (x ', y') and ray 114b), etc., it is now time to find the position of the camera 1 in the second operation (436). At least one restricted range or interval of candidate spatial locations is restricted (352, 434) to the line segment between ray 3(415) and the intersection of the contained volume 96. For the pixel (x0, y0) associated with ray 3(415), several depths d are considered, for example, by scanning from the proximal end 415' to the distal end 415 ″. When the depth d is associated with the position a ', it is searched in step 435 whether the pixel (x', y ') is associated with the position a'. The pixel (x ', y ') from the second 2D image (acquired from camera 2 (114)) is found to correspond to position a '. However, at the first operation 431 pixel (x ', y ') has been associated with position 93, which position 93 is further than position a ' within the range of the candidate spatial position (ray 114b) associated with pixel (x ', y '). Thus, at step 436 (e.g., 352), it may be automatically appreciated that location a' is unacceptable for pixel (x0, y0) (associated with ray 3 (415)). Thus, at step 436, the location a 'is excluded from the limited range of acceptable candidate spatial locations, and the location a' will not be calculated at step 437 (e.g., 37, 353).

11. Limiting of acceptable surface normals for depth estimation

Although section 10 addresses limiting the acceptable depth value for a given pixel, approximating a surface also allows constraints to be imposed on the surface normal. This constraint can be used in addition to the range limitations discussed in section 10, which is the preferred approach. However, it is also possible to impose constraints on the normal without limiting the possible depth values.

11.1 problem expression

The computation of depth requires the computation of some matching cost and/or similarity measure for different depth candidates (e.g., at step 37). However, due to noise in the image, it is not sufficient to consider the matching cost of only a single pixel, but the matching cost of the entire region located around the pixel of interest. This region is also referred to as an integration window because the matching costs for all pixels located within the region or integration window are aggregated or summed in order to calculate the similarity metric for the pixel of interest. In many cases, when calculating the aggregate matching cost, it is assumed that all pixels in a region have the same depth.

Fig. 15 illustrates this approach, assuming that the depth of the intersecting pixels 155 in the left image 154a should be calculated. To do so, its pixel values are compared to each of the possible corresponding candidates in the right image 154 b. However, comparing individual pixel values results in a very noisy depth map, since the pixel color is affected by various noise sources. As a remedy, it is generally assumed that neighboring pixels 155b will have similar depths. Thus, each neighboring pixel in the left image is compared to the corresponding neighboring pixel in the right image. The matching costs for all pixels are then aggregated (added) and then assigned as the matching costs for the cross pixels in the left image. The number of pixels for the aggregate matching cost is defined by the size of the aggregation window. This approach may provide good results if the surfaces of objects in the scene are approximately forward parallel to the camera. If not, it is inaccurate to assume that all neighboring pixels have approximately the same depth and will result in contamination with minimal matching cost: to achieve the minimum matching cost, the corresponding pixels need to be determined based on their true depth values. Otherwise, the pixel is compared to its wrong counterpart, increasing the resulting matching cost. The latter increases the risk of choosing the wrong minimum for the depth calculation.

This situation can be improved by approximating the surface of the object by a plane whose normal can have any orientation [9 ]. This is particularly beneficial for surfaces that are strongly inclined with respect to the optical axis of the camera as shown in fig. 16. In this case, by considering the location and normal, a more accurate corresponding pixel in the right image (e.g., 154b) may be determined for each pixel in the left aggregation window (e.g., 154 a). In other words, given a depth candidate (or other candidate location, e.g., as processed in steps 37, 353, etc. and/or by block 363) and associated normal vector for the pixel (or other 2D representation element) for which a depth value should be calculated, a calculation may be made for all other pixels in the integration window, where they will be located in 3D space, assuming a presumed surface plane. Having this 3D position for each pixel then allows the correct correspondence to be calculated in the right image. On the one hand, this approach reduces the minimum matching cost achievable, resulting in excellent depth map quality, and on the other hand, it greatly increases the search space per pixel. This may require applying a technique to avoid searching all possible normal and depth value combinations. Instead, the selected depth and normal combination are evaluated according to their matching costs. From all evaluated depth and normal combinations, the combination that results in the smallest local or global matching cost is selected as the depth value and surface normal for the given pixel. However, such heuristics may fail, resulting in erroneous depth values.

11.2 Normal-aware depth estimation Using user-supplied constraints

One approach to overcome the difficulties described in section 11.1 is to reduce the search space by using the information given by the user-drawn approximation surface. In more detail, the normal of the approximation surface may be considered as an estimate of the normal of the object surface. Thus, it is not necessary to investigate all possible normal vectors, but only normal vectors close to the normal estimate. As a result, the problem of correspondence determination is disambiguated and results in excellent depth map quality.

To achieve these benefits, we perform the following steps:

1. the user draws (or otherwise defines or selects) a rough approximation of the surface. For example, such an approximation surface may be composed of meshes to be compatible with existing 3D graphics software. In these ways, each point on the approximated surface has an associated normal vector. If the approximation surface is a mesh, the normal vectors may, for example, be only normal vectors of the corresponding triangles of the mesh. Since the normal vector essentially defines the orientation of the plane, the normal vectorAndare equivalent. Thus, the orientation of the normal vector may be defined based on some other constraint, such as imposed in sections 9.4 and 10.

2. The user specifies (or otherwise inputs) a tolerance that the actual normal vector of the real object surface may deviate from the normal estimate derived from the approximated surface. The tolerance may be determined by the maximum tilt angle θ relative to the coordinate system_maxTo indicate that the normal vector represents the z-axis (depth axis) in this coordinate system. Furthermore, the tolerance angles of the approximation surface and the containing volume may be different.

3. For each pixel in the camera view for which depth should be estimated, the ray intersects all approximation surfaces and the containing volume provided by the user. If such an intersection exists, the approximate surface or normal vector containing the volume at the intersection is considered an estimate of the normal vector of the surface object.

4. Preferably, the range of acceptable depth values may be limited, as described in sections 10 and 13.

5. The automatic depth estimation program then considers the normal estimation, e.g., as described in section 11.3.

To avoid imposing false normals, the approximation surface may be constrained to lie only inside the object and not beyond the object. The contention constraints can be mitigated by defining the containment volumes in the same manner as discussed in section 9.3. If the ray intersects the approximation surface and the containing volume, multiple normal candidates may be considered.

This can be seen by ray 2(115) in fig. 11. As described in the previous section, ray 2 may depict an object element of object 111 or an object element of object 91. For theFor these two possibilities, the normal vectors are expected to be very different. If the object element were to belong to object 111, the normal vector would be orthogonal to the approximation surface 112. If the object element were to belong to object 91, the normal vector would be orthogonal to the surface of sphere (91) in point 93. Thus, for each restricted range of candidate spatial locations defined by a respective containment volume, a different normal vector may be selected and considered

For a planar approximation surface (see section 9.4), the constraint on the normal vector should only be applied when the dot product with the intersecting rays is negative.

11.3 Normal information cost aggregation and use of plane assumptions

A method of locating an object element by using a similarity measure is explained herein. In the method, the normal of the approximation surface (or containing volume) at the intersection is usedAnd (5) vector quantity.

When aggregating matching costs of several pixels, a normal estimation of the pixel area may be used. Let (x)₀,y₀) Are the pixels in the camera 1 for which the depth values should be calculated. Aggregation of matching costs may then be performed in the following manner

d is a depth candidate which is a depth candidate,are normal vector candidates for which a similarity measure (matching cost) should be calculated. Vector quantityGenerally similar to an approximation surface or containing a volume (96) between it and the pixel (x) under consideration₀,y₀) Is determined by the normal vector at the intersection (96') of the candidate spatial positions 115(see section 11.4). In the retrieval step, d andseveral values of (a) were tested.

N(x₀,y₀) Containing matching costs to be aggregated to compute pixel (x)₀,y₀) Of all pixels of the depth of (a). c (x, y, d) represents the matching cost for pixel (x, y) and depth candidate d. The sum symbol in equation (3) may represent a sum, or may represent a more general aggregation function.

Is determined by the normal vectorOn the assumption of a planar surface of the representation, on a pixel (x) basis₀,y₀) To compute a function of the depth candidate for pixel (x, y). For this purpose, a plane located in 3D space is considered. Such a plane can be described by the following equation:

without loss of generality, orderRepresented in the coordinate system of the camera for which depth values should be calculated. Then there is a simple relationship between the 3D point (X, Y, Z) and the corresponding pixel (X, Y) represented in the camera coordinate system, as shown in fig. 17:

k is a so-called internal reference camera matrix (and)And possibly included in the camera parameters), f is the focal length of the camera, pp _xAnd pp_yIs the location of the principal point and s is the pixel clipping (cropping) factor.

Combining equations (4) and (5) to obtain

Given Z ═ D (x)₀,y₀X, y, D) and D (x)₀,y₀,x₀,y₀D) d, b can be calculated as follows:

thus, the depth candidate for pixel (x, y) may be determined by:

in other words, the disparity, which is proportional to 1 in depth, is a linear function in x and y.

In these ways, we can target each pixel (x, y) in the first camera and each depth candidate d and each normal candidateTo calculate the depth candidate D. Having such depth candidates allows calculating the corresponding matched pixel (x ', y') in the second camera. The matching cost or similarity metric may then be updated by comparing the value of pixel (x, y) in the first camera with the value of pixel (x ', y') in the second camera. Instead of this value, a derived quantity such as a census transform may be used. Since (x ', y') may not be integer coordinates, interpolation may be performed prior to comparison. 11.4 Using Normal information in depth estimation with plane assumptions

Based on these relationships in section 11.3, there are two possible ways to include usersAnd the provided normal information is used for eliminating ambiguity of corresponding relation determination and obtaining greater depth map quality. In a simple case, assume a normal vector derived from a user-generated approximation surface Is correct. In this case, bySetting as the normal vector of the user-provided approximation surface, the depth candidate for each pixel (x, y) is calculated directly from equation (6)It is not recommended to use this method for the containing volume because the normal vector at the intersection between the pixel ray and the containing volume may be completely different from the normal vector at the intersection between the pixel ray and the real object.

In more advanced cases, the depth estimation process (e.g., at step 37 or 353) is based on the normal vector from the approximate surface or containing volumeAnd searching for the best possible normal vector within a range defined by some tolerance values. These tolerance values may be defined by the maximum tilt angle θ_maxAnd (4) showing. The angle may vary depending on which approximation surface or which containing volume the ray of the pixel intersects. Order toIs a normal vector defined by an approximate surface or containing volume. Then the set of acceptable normal vectors is defined as follows:

theta is around the normal vectorAngle of inclination (i.e., for)，θ_maxIs a predetermined threshold (relative toMaximum tilt angle of). Phi is the azimuth angle, and its possible value is set to 0,2 pi]To cover all possible normal vectors that deviate from the normal vector of the approximation surface by an angle phi. The obtained vector Explained with respect to an orthogonal coordinate system, the third axis (z) of which isParallel to two other axes (x, y) andare orthogonal. Vector quantityThe transformation of the coordinate system used can be obtained by the following matrix vector multiplication:

(Vector)andare column vectors, andandthe calculation is as follows:

since the set contains an infinite number of vectors, a subset of angles can be tested. The subset may be defined randomly. For example, for each test vectorAnd each depth candidate d to be tested, equation (6) may be used to compute the depth candidates for all pixels in the aggregation window and to compute the matching cost. The depth of each pixel can then be determined by minimizing local or global matching costs using a normal depth calculation program. In a simple case, for each pixel, the normal vector with the smallest matching cost is selectedAnd depth candidates d (winner all win strategy). An alternative global optimization strategy can be applied, penalizing depth discontinuities.

In this case, it is important to consider that the contained volumes are only very rough approximations of the underlying object, except when they are automatically calculated from the approximated surface (see section 12). If the contained volume is only a very rough approximation, its tolerance angle should be set to a much larger value than the tolerance angle used to approximate the surface.

This can be seen again in fig. 11. Although the contained volume 96 is a fairly accurate approximation of a sphere (real object 91), the normal vector at the intersection (93) between ray 2(115) and real object (91) is very different from the normal vector at the intersection (96') between ray 2(115) and contained volume (96). Unfortunately, only the latter is known and assigned to the vector in the limiting stepTherefore, in order to align the normal vector of the surface of the real object (91,93)Including in the normal candidate set, determined normal vectorThere is a considerable tolerance angle between them. In the intersection between ray 2(115) and the containing volume (96), it is desirable to allow acceptable candidate normal vectors

In other words, each interval of a spatial candidate location may have its own associated normal vectorIf a ray intersects an approximation surface, then within the interval of spatial candidate locations defined by this approximation surface, the normal vector of the approximation surface at its intersection with the ray may be considered a candidate normal vectorFor example, referring to FIG. 48, normal vectors are selected for all candidate spatial locations located between points 486a and 486bTo correspond to vector 485a because the approximated surface is typically a fairly accurate representation of the surface of the object. On the other hand, the contained volume 482b is only a very rough approximation of the contained object 487. Furthermore, the interval of candidate spatial locations between 486c and 486d does not contain any approximation surfaces. Therefore, the normal vector can only be estimated very roughly, suggesting the use of a larger tolerance angle θ _max. In some examples, θ_maxAnd may even be set at 180 deg., which means that the normal vector of the approximation surface is not at all restricted. In other examples, the method may compriseDifferent techniques are used to try to interpolate the best possible normal estimate. For example, in the case of FIG. 48, for each candidate spatial location between 486d and 486cAssociated normal vectorThe following can be estimated:

wherein

·Candidate spatial positions under consideration

·With the candidate spatial position under considerationAssociated normal vector

·Intersection point 486d

·Intersection point 486c

·Containing normal vector 485c of volume at intersection point 486c

·Containing normal vector 485d of volume at intersection point 486d

If the ray does not intersect any approximation surface, no normal candidates are available and the depth estimator behaves as usual.

The method 440 is shown in fig. 44. At step 441, pixel (x0, y0) (e.g., associated with ray 2(145)) is considered in the first image. Thus, at step 442, the deriving step 352 and the limiting step 353 allow for limiting the range or interval of candidate spatial locations (initially ray 2(145)) to a limited range or interval of acceptable candidate spatial locations 149 (e.g., between a first near end (i.e., camera location) and a second far end (i.e., the intersection between ray 2(145) and the approximation surface 142). A new candidate depth d is then selected 443 within the range or interval of acceptable candidate spatial locations 149. Find a vector perpendicular to the approximation surface 142 At 444, from the candidate vectorsTo select candidate vectors(which is andforming an angle within a predetermined angular tolerance). Then, depth candidates are obtained (e.g., using equation (7)) at 445Then, at 446Is updated. It is verified at 447 whether there are other candidate vectorsTo be processed (if so, a new candidate vector is selected at 444At 448, it is verified whether there are other candidate depths d to process (if so, a new candidate depth is selected at 443). Finally, at 449, comparisons may be made andso as to select the smallestAssociated depth d and normal

Automatic creation of containment volumes from approximate surfaces

This method gives some examples of how to derive the contained volume from the approximated surface to solve competing approximated surfaces and define an acceptable depth range. It should be noted that all presented methods are examples only, and other methods are possible.

12.1 computing the contained volume by scaling the closed volume approximation surface

Referring to fig. 18, containing volume 186, e.g., at least some of the above and below containing volumes, may be generated from approximating surface 186 by scaling approximating surface 186 with respect to zoom center 182 a. For each control point (vertex) of the mesh, a vector between the control point and the zoom center is calculated And extend it by a constant factor to calculate the new position of the control point:

although very simple, a feature of this method is that the distance between corresponding surface elements is not constant, but depends on the distance to the zoom center 182 a. For very complex approximation surfaces, the resulting scaled volume intersects the approximation surface, which is generally undesirable.

12.2 calculation of the contained volume from the planar approximation surface

The simple method described in section 12.1 is particularly effective when the approximation surface 186 is a closed volume. A closed volume may be understood as a mesh or structure in which each side of all mesh elements (e.g. triangles) has an even number of adjacent mesh elements.

If this property is not satisfied, the method needs to be slightly modified. Then, for all edges connected to only one grid element, additional grid elements need to be inserted, connecting them with the original edges, as shown in fig. 19.

12.3 grid movement and insertion of New control points and grid elements

While the methods discussed in sections 12.1 and 12.2 are easy to implement, they are generally not universally applicable due to inherent limitations. Thus, a more advanced approach will be described below.

Assume a surface mesh 200 as shown in fig. 20. The surface mesh 200 is a structure including vertices (control points) 208, edges 200bc, 200cb, 200bb, and surface elements 200a-200 i. Each edge connects two vertices, and each surface element is surrounded by at least three edges, and there is a connection path from each vertex to an edge of any other vertex of the structure 200. Each edge may be connected to an even number of surface elements. Each edge may be connected to two surface elements. The structure may occupy an enclosed structure without a boundary (structure 200, not an enclosed structure, may be understood to be a reduced portion of an enclosed structure). Even though fig. 20 appears to be planar, it is understood that the structure 200 extends in 3D (i.e., the vertices 208 are not necessarily all coplanar). In particular, the normal vector of one surface element may be different from the normal vector of any other surface element.

The normal vectors defining the surface mesh 200 all point to the outside of the corresponding 3D object. The outer surface of the mesh 200 may be understood as the approximation surface 202 and may embody one of the approximation surfaces discussed above (e.g., the approximation surface 92).

The containment volume may then be generated by moving all mesh elements (e.g., triangles) or surface elements 200a-200i by a user-defined value along their normal vector.

For example, in the transition from fig. 20 to fig. 21, the surface element 200b has been translated along its normal by the user-defined value r. Normal vectors in fig. 20 and 21Has three dimensions and is therefore directed in some way away from the third dimension of the plane of the paper. For example, surface 200bc, which is connected to surface 200cb in fig. 20, is now separated from surface 200cb in fig. 21. In this step, all previously connected edges may be reconnected by the corresponding grid element, as shown in FIG. 22. For example, point 211c is connected to point 211bc by a segment 211' to form two new elements 210cb and 210 bc. These new elements interconnect the previously connected edges 200bc and 200cb (fig. 20), which are broken by movement along the normal (fig. 21). We also note that in fig. 20, a control point 210 is defined that is connected to more than two grid elements (200a-200 i). Next, as shown in fig. 23, for each previous control point (e.g., 210) connected to more than two grid elements, a duplicate control point (e.g., 220) may be inserted, such as an average of all new positions of the previous control point 210. (in FIG. 22, the new element is colored in full color, while the old element is not colored.)

Finally, as shown in FIG. 24, the replicated control points 220 may be connected with additional grid elements (e.g., 200a-200i) to each edge that reconnects control points originating from the same source control point. For example, replicated control point 220 is connected to edge 241 between control points 211c and 211d because they both correspond to the same source point 210 (FIG. 20).

In the case where the original approximation surface 200 is not closed, for all edges connected to an odd number of mesh elements, additional mesh elements connecting them with the original edges may be inserted.

The above procedure may result in grid elements crossing each other. These intersections can be resolved by some form of mesh cleaning in which all triangles that intersect another triangle are cut along the intersection lines into up to three new triangles, as shown in fig. 25 and 26. All sub-volumes that are completely contained in another volume of the created grid may then be removed.

The volume occupied within the outer surface of the grid 200 so modified may serve as the containment volume 206 and may embody one of the containment volumes discussed above (e.g., in fig. 9, the containment volume 96, generated from the approximation surface 92).

In general, at least a portion of one containing volume (e.g., 96, 206) can be formed starting from a structure (e.g., 200), and wherein at least one containing volume (e.g., 96, 206) is obtained by:

Moving at least some elements (e.g., 200a-200i) by decomposing the at least some elements (e.g., 200a-200i) along their normals (e.g., from FIG. 20 to FIG. 21);

reconnecting (e.g., from fig. 21 to fig. 22) edges (200bc, 200cb) of elements (e.g., 200b, 200c) that were connected before the move and disconnected due to the move by generating additional elements (e.g., 210bc, 210 cb); and/or

Inserting a new control point (e.g., 220) within a decomposed region (e.g., 200') of each control point (210) in the original structure that has been connected to more than two grid elements (e.g., from FIG. 22 to FIG. 23);

the new control point (e.g., 220) is reconnected with the decomposed element (e.g., 210bc) to form a further element (e.g., 220bc) (e.g., from fig. 23 to fig. 24) by constructing a triangular mesh element (220bc) that connects the new control point (e.g., 220) and two control points (211d, 211c) that originate from the same source control point (210) and whose respective connecting edges (210bc, 210cb) are neighbors.

13 excluding volume constraints

13.1 principle

The method discussed in sections 9 to 12 directly refines the depth map by providing a coarse model of the relevant object. This approach is very intuitive for the user and approaches today's workflow, where 3D scenes are typically reconstructed by 3D technicians.

However, it is sometimes more intuitive to exclude depth values by defining areas without 3D points or object elements. Such constraints may be defined by exclusion volumes.

Fig. 27 shows a corresponding example. It illustrates a scene containing multiple objects 271a-271D that can be observed by different cameras 1 and 2(274a and 274b) to perform localization processes such as depth estimation and 3D reconstruction. If the depth (or other positioning) estimate is erroneous, the point locates the error in 3D space. Through scene understanding, users can often quickly identify these error points. He may then draw or otherwise define a volume 279 (a so-called excluded volume or excluded volume) in 3D space, which volume 279 indicates a position where no object is placed. For example, the depth estimator may use this information to avoid ambiguities in the depth estimation.

13.2 unambiguous user-specified exclusion zone by enclosing volume

To define regions where no 3D points or object elements should exist, a user may draw or otherwise define an enclosed 3D volume (e.g., using constraint definer 364). A closed volume is intuitively understood as a volume into which water cannot penetrate when the volume is placed under water.

Such a volume may have a simple geometric shape, such as a cuboid or a cylinder, but more complex shapes are also possible. To simplify the calculations, the volumes of these surfaces may be represented by triangular meshes (e.g., as shown in fig. 20-24). Independent of the actual representation, each point on the surface of such an excluded volume may have an associated normal vectorTo simplify the later calculations, the normal vectors may be oriented in such a way that they all point outside the volume, or to the inside of the volume. In the following, we assume a vectorAre directed toward the exterior of volume 279 as shown in fig. 28. The calculation of depth involves (range from) excluding the volume.

Excluding the volume allows limiting the possible depth range of each pixel of the camera view. In these ways, for example, ambiguity in depth estimation may be reduced. Figure 29 depicts a corresponding example with multiple excluded volumes 299a-299 e. To calculate a possible depth value for each pixel of the image, a ray 295 (or another range or interval of candidate locations) may be cast from the pixel through the nodes of the camera 294. This ray 295 may then intersect the surface of the volume (e.g., in a manner similar to fig. 8-14). The intersection points I0 '-I9' may be done (at least superficially) in both directions of the ray 275, although light enters the camera only from the right side (thus points I0 '-I2' are excluded). The intersection point I0 '-I2' on the retroreflection is at a negative distance from the camera 294. All found intersections I0 '-I9' (the ends of multiple restricted ranges or intervals of acceptable candidate positions) are then sorted in increasing distance.

Let N be the number of found intersections (e.g., 9 in the case of fig. 29), and r be a vector describing the ray direction from the camera entrance pupil. Let depth _ min be a global parameter that defines the minimum depth that an object can get from the camera. Let depth _ max be a global parameter that defines the maximum depth (which may be infinite) that an object can get from the camera.

The process of calculating the set of possible depth values for a given pixel (which set forms a limited range or interval of acceptable candidate locations) is given below.

It must be noted that, similar to section 10.1, the exclusion values may be ignored for those pixels whose depth may be determined by the automated program to have a reliability or confidence greater than or equal to the user-provided threshold C0. In other words, if a given pixel for which a depth value having sufficient reliability has been calculated, its depth value does not change, and the foregoing procedure is not performed for that pixel.

13.4 use of excluded volumes in the depth estimation procedure

The depth estimator (e.g., at step 37 or 353 or via block 363) may calculate a matching cost for each depth (see section 5). Thus, by setting the cost of impermissible depths to infinity, depth estimation (or other localization) can be easily performed using excluded volumes.

13.5 definition of exclusion volume

The exclusion volume may be defined by a 3D graphics software based method. They may be represented in the form of a mesh, such as those shown in fig. 20-24, which may be based on a triangular structure.

Alternatively, instead of rendering the closed volume directly in 3D graphics software, the closed volume may also be derived from a 3D surface. The surface can then be extruded into a volume by any method.

13.6 specification of exclusion of regions by surface meshes

In addition to specifying areas excluded by volume, they may also be excluded by surface. However, such surfaces are only effective for a subset of the available cameras. This is shown in fig. 30. Assume that the user places (e.g., using constraint definer 364) surface 309 and declares that the surface leaves no objects (and thus is an excluded surface). While this statement is certainly valid for camera 1(304a), it is false for camera 2(304 b). This problem can be solved by indicating that the surface 309 is only considered for the camera 1(304 a). Since this is more complicated, it is not a preferred method.

13.7 combination of excluded volume with approximate surface and contained volume

The excluded volume may be combined with the approximated surface and the included volume to further limit the acceptable depth values.

13.8 automatic creation of excluded volumes for multi-camera consistency

Section 10.4 introduces a method of how to further refine the constraints based on multiple cameras. This method essentially considers two cases: first, in case a reliable depth can be calculated for a first object in a first camera, it is avoided that the depth value calculated in a second camera would place the second object in such a way that the second object occludes the first object in the first camera. Secondly, if there is no such reliable depth for the first object in the first camera, it can be appreciated that the depth value of the first object in the first camera is constrained by the contained volume and the approximated surface associated with the first object in the first camera. This means that it is not allowed for the second camera to calculate the depth value of the second object in such a way that the second object would occlude all the contained volumes and approximated surfaces of the first object in the first camera.

The concept of section 10.4 can be expressed in an alternative way by automatically creating an exclusion volume. This is shown in fig. 37. The basic idea includes creating an excluded volume (e.g., 379) to prevent objects from being placed (at step 37) between the camera 374a and the inclusion volume 376 of the first intersection of pixel rays. Furthermore, it may even be prevented that objects are placed between the containing volumes. However, this approach would require the user to have surrounded each object with a containment volume, which is not typically the case. Thus, although possible (see also section 10.4.2), this is not considered in detail below.

The following describes a procedure for creating excluded volumes to prevent objects from appearing between the camera and the associated first intersected inclusion volume for each camera ray. The containment volume is assumed to be described by a triangular or planar mesh element.

The core idea of this procedure is to find for each pixel (or other 2D representation element) a first containing volume that intersects the ray defined by the pixel under consideration and the camera entrance pupil or node. Then, all spatial positions between the camera and this intersecting containing volume are unacceptable and can therefore be excluded if the ray also intersects the approximation surface. Otherwise, the containment volume may be ignored, as described in section 10.1.

Based on this basic idea, the challenge now lies in grouping all the different pixel rays into a compact excluded volume described in the form of a grid. To this end, previous programs created two pixel maps (pixel maps) I _1 and I _ 2. The pixel map I _1 defines, for each pixel, an identifier of the nearest containing volume intersected by the corresponding ray. A value of zero means that no containing volumes have been intersected. The pixel map I _2 defines, for each pixel, an identifier of the closest approximation surface intersected by the corresponding ray. A zero value means that no approximation surfaces have been intersected. The combined pixel map I _3 finally defines, for each pixel, the identifier of the nearest containing volume intersected by the corresponding ray, assuming that the approximation surface has also been intersected. Otherwise the pixel map value is equal to zero, which means that no relevant containing volumes have been intersected.

Lines 14-25 are then responsible for creating a copy of the contained volume, with only those portions visible in camera C _ i remaining. Or in other words, only the portions of the containing volume are retained, the projections of these portions to the camera C _ I coinciding with those pixels in I _3 whose mapped values are equal to the identifier of the containing volume. Lines 26 through 37 ultimately form excluded volumes from those remaining meshes by connecting the relevant edges to the nodes of the camera. The relevant edges are edges located at the boundaries of the replicated inclusion body grid.

More generally, in the example of fig. 37 (except for inclusion volume 376b, which is not of interest here), inclusion volume 376 has been defined for object 371 (not shown), e.g., manually by a user, with reference to two cameras 1 and 2(374a and 374 b). Further, an approximation surface 372b has been defined (e.g., manually by a user) for object 371b (not shown). There may be a possibility to automatically create the excluded volume 379. In fact, due to the fact that the containing volume 376 is imaged by camera 1(374a), it can be inferred a priori that no object is located between containing volume 2(372) and camera 1(374 a). Such a conclusion may depend on the fact whether the corresponding ray intersects the approximation surface. Such a limitation may be concluded from enumeration 2 of section 10.1, suggesting that the containment volume be ignored if the ray (or other candidate location range) does not intersect the approximation surface. It must be noted that this created excluded volume may be particularly relevant (and in some cases, only relevant) in case the reliability of the automatically pre-computed disparity value (34, 383a) is less than the provided user threshold C0. Thus, when limiting the range of intervals for candidate spatial positions (which may be ray 375 emerging from camera 2(374 b)), the limited range of acceptable candidate spatial positions may include two intervals (e.g., two unconnected intervals):

A first proximal region 375' from the location of the camera 374b to the exclusion volume 379; and

a second distal interval 375 "from the excluded volume 379 towards infinity.

Thus, excluding locations within volume 379 will be avoided (and not processed) when retrieving a limited range of acceptable spatial locations (e.g., at 37 or 353 or by block 363, or by other techniques).

In other examples, the exclusion volume may be generated manually by a user.

In general, the restriction (e.g., at 35, 36, 352, by block 362, etc.) may include finding an intersection between a range or interval of candidate locations and at least one excluded volume. The limiting may include finding an end of a range or interval of candidate locations with at least one of an inclusion volume, an exclusion volume, an approximation surface.

Method for 14 detecting depth map errors

In order to be able to place 3D objects at the best possible locations to improve the depth map, the user needs to be able to analyze the locations where depth map errors occur. For this purpose, different methods can be used:

the easiest case occurs when the depth map contains holes (lack of depth values). To correct for these artifacts, the user needs to draw 3D objects that help the depth estimation software to fill in these holes. It is noted that by applying different kinds of consistency checks, erroneous depth values may be translated in the holes.

Large depth map errors can also be detected by looking at the depth or disparity images themselves.

Finally, depth map errors can be identified by placing the virtual camera in a different location and performing view rendering or display based on the light field program. This will be explained in more detail in the next section.

14.1 coarse depth map error detection based on View rendering or display

FIG. 39 shows a process 390 (see also FIG. 38) implementing a program for how an error location is determined based on view rendering or display. To do so, the user may perform an automatic depth map calculation (e.g., at step 34). The generated depth map 391 '(e.g., 383a) is used at 392 to create a new virtual camera view 392' using view rendering or display. Based on the synthesis results, at 393, the user may identify (e.g., visually) which depth values contribute to artifacts visible in the synthesis results 392' (in other examples, this may be performed automatically).

Accordingly, the user may enter (e.g., at 384a) a constraint 364', such as excluding a partial range or interval of candidate spatial locations, with the intent of excluding apparently invalid locations. Thus, a depth map 394 (with constraints 364') may be obtained in order to obtain a limited range of acceptable spatial locations. The relevant object can then be processed by the method 380 described above (see in particular the previous section). Otherwise, a new view rendering or display 395 may be performed to confirm whether the artifact can be effectively eliminated or whether user constraints need to be improved.

FIG. 31 illustrates an example of image-based view rendering or display. Each rectangle corresponds to a camera view (e.g., a 2D previously processed 2D image, such as the first or second 2D image 363 or 363b, and which may have been acquired by a camera, such as cameras 94, 114, 124a, 134, 144, 274a, 274b, 294, 304a, 304 b). The solid rectangles 313a-313f represent images acquired by real cameras (e.g., 94, 114, 124a, 134, 144, 274a, 274b, 294, 304a, 304b, etc.) in a predetermined positional relationship to one another, while the dashed rectangle 314 defines a virtual camera image to be synthesized (using the determined depth map, or positioning of object elements in the images 313a-313 f). Arrows 315a and 315e-315f define cameras (here 313a and 313e-313f) for rendering or displaying the virtual target view 314. It can be seen that not all camera views need to be selected (this may follow a selection from the user or a selection from an automatic rendering or display program).

As shown in fig. 32, when the user finds (e.g., at step 38 or 351) a view rendering or display artifact, he may mark the corresponding region by the marking tool 316. The user can operate so that the marking area contains the wrong pixels. To simplify the user's operation, the marker region 316 may also contain some pixels (or other 2D representation elements) that are rendered correctly (but some pixels that are rendered correctly are exactly in the marker region due to the proximity of the marker made by the user). In other words, the marking need not be very precise. A rough marking of the artifact areas is sufficient. Only the number of correct pixels should not be too large, otherwise the accuracy of the analysis will be reduced.

The view rendering or display program may then mark all source pixels (or other 2D representation elements) that contribute to the marked region, as shown in fig. 32, essentially identifying regions 316D, 316e, 316f', and 316f "associated with the marked error region 316. This is possible because the view rendering or display program has basically moved each pixel of the source camera view (313a, 313d-313f) to a position in the virtual target view (314) based on the depth value of the source pixel. All pixels in 314 that are occluded by other pixels are then removed. In other words, if multiple source pixels from images 313a-313f are rendered to the same pixel in the target view 314, only those with the smallest distance to the virtual camera remain. If several pixels have the same distance, they are merged or blended, for example by calculating the average of all pixels having the same minimum distance.

Thus, by simply tracking which pixels contribute to the error region, the view rendering or display program can identify the source pixels. The source regions 316d, 316e, 316f' and 316f "need not be concatenated, although the error flags are concatenated. Based on the semantic understanding of the scene, the user can easily identify which source pixels should not contribute to the labeled region and take action to correct their depth values.

14.2 Small depth map error detection based on View rendering or display

The above-described error detection works well when some depth values are not substantially identical to their correct values. However, if the depth value deviates only by a small value, the marked source region will be correct, although the view rendering or display may become blurred or show small artifacts.

However, a method similar to that described above may be performed. In the case that the source region is correct, this means that the depth values are approximately correct, but require refinement to obtain a clearer view rendering or display or less artifacts. Therefore, the user needs to take measures to improve the depth map indicating the source region.

15 method for creating a position constraint in a 3D space

This section describes some general strategies for user interfaces and how users create approximation surfaces in 3D editing software.

Note that fig. 4-14 and 29, 30, 37, 41, 49, etc. may be understood as images (e.g., 2D images) displayed to the user (e.g., via a GUI associated with constraint definer 364) based on the previous coarse positioning obtained, for example, at step 34. The user may see the images and may suggest refining them using one of the above and/or below methods (e.g., the user may graphically select constraints, such as approximating a surface and/or containing a volume).

15.1 problem expression

To be able to interactively refine the depth map, the user may create a 3D (e.g., using constraint definer 363) geometric constraint that matches the captured shots as closely as possible.

The 3D geometry may be created in a variety of ways. Preferably using any existing 3D editing software. Since this method is well known, it will not be described in detail below. Instead, we consider a method of rendering 3D geometry based on one, two or more 2D images.

15.2 basic concept

Since we have multiple photos 316a-316f of the same scene, the user can select two or more of them to create constraints in 3D space. Each of these images may then be displayed in a window of the graphical user interface (e.g., operated on by constraint definer 364). The user then locates the 2D projection of the 3D geometric primitive in the two 2D images by overlaying the 2D projection of the 3D geometric primitive with the portion of the 2D image that the geometric primitive should model. As the user positions the geometric primitive projection in both images, the geometric primitive is also located in 3D space.

To ensure that the rendered 3D geometric primitives never extend beyond the object and are contained only inside the object, the rendered 3D geometric primitives may be moved a little later with respect to the optical axis of the camera. Also, there are a variety of methods that can be used to achieve these goals. In the following we will introduce extensions of this concept to simplify the creation and editing process.

15.3 Single Camera editing in constant coordinate projection mode

In this drawing mode, the user defines a 3D reference coordinate system that can be arbitrarily placed. The user can then select the coordinates that should have a fixed value. When drawing a 2D object in a single camera view window, a 3D object is created whose projection is directed to the 2D object displayed in the camera window, and the fixed coordinates have a defined value. Note that with this approach, the position of the 3D object in 3D space is uniquely defined.

Let us assume that the coordinates of the projected 2D image are marked with u and v. The calculation of the 3D coordinates in the reference coordinate system can be achieved by inverting the following relationship:

k is the camera internal reference matrix and R is the camera rotation matrix relative to the 3D reference coordinate system. Since one of the coordinates X, Y, Z is fixed and u and v are also given, the three equations remain with three unknown variables, resulting in a unique solution.

15.4 Multi-Camera rigid outer edition mode

The strict epi-editing mode is a very powerful mode to place 3D objects in 3D space according to captured camera views. To do so, the user needs to open two camera view windows and select two different cameras. The two cameras should show objects to be partially modeled in 3D space to aid in depth estimation.

Referring to fig. 33, the user is allowed to select and move control points (e.g., corner points of a triangle) 330 of a grid, line or spline in the pixel coordinate system (also referred to as uv coordinate system in image space) of the selected camera view. In other words, the selected control point is projected/imaged to/by the selected camera view, and the user is then allowed to move this projected control point in the image coordinate system. The position of the control points in 3D space is then adjusted in such a way that: so that its projection on the camera view corresponds to the new position selected by the user. However, by assuming that the position of the same control point (330 ') in the other camera view (335') does not change, the possible movement in the image coordinate system is limited. Thus, the user is only allowed to change the control point in camera 1 along the epi-polar line 331 of the selected camera pair. In other words, given the control points of the 3D grid depicted by point 330 in camera view 1(335) and by point 330 ' in camera view 2(335), assume that control point position 330 ' is already on the right pixel displaying the desired object element in camera view 2(335 '). The user then moves the control point position 330 in camera view 1 in such a way that: such that its position matches an object element that is identical to the physical element of camera view 2. In these ways, the user can accurately set the depth of the modified control point.

Accordingly, a method may be defined comprising:

selecting a first 2D image (e.g., 335) of the space and a second 2D image (e.g., 335') of the space, wherein the first 2D image and the second 2D image have been acquired at camera positions in a predetermined positional relationship to each other;

displaying at least a first 2D image (e.g., 335),

guiding a user to select a control point in a first 2D image (e.g., 335), wherein the selected control point (e.g., 330) is a control point (e.g., 210) that forms an approximate surface or an element (e.g., 200a-200i) that excludes a volume or a structure (e.g., 200) that includes a volume;

directing a user to selectively translate a selected point (e.g., 330) in a first 2D image (e.g., 335) while constraining the point from moving along an epipolar line (e.g., 331) in a second 2D image (e.g., 335') associated with the point (e.g., 330 '), wherein the point (e.g., 330 ') corresponds to a control point (e.g., 210) of an element (e.g., 200a-200i) of the same structure (e.g., 200) as the point (e.g., 330),

to define the movement of elements (e.g., 200a-200i) of a structure (e.g., 200) in 3D space.

15.5 Multi-Camera free epi-polar editing mode

In this example, the user selects two different camera views displayed in two windows. The two cameras have a predetermined spatial relationship with each other. These two images show an object that should be modeled at least partially in 3D space to aid in depth estimation.

Similar to the previous section, with reference to fig. 34, consider grid control points imaged at location 340 in camera view 1(345) and location 340' in camera view 2. The user is then allowed to select and move control points 340 of a grid, line or spline in the camera view 1(345) in the image space or image coordinate system. While the strict epi-polar editing mode of the previous example assumes a fixed position of the control point 340 'in the second camera view, the free epi-polar editing mode of the present example assumes that the control point 340' in the second camera view moves as little as possible. In other words, the user may freely move the control point in camera view 1, e.g., from 340 to 341. The depth values of the control points are then calculated in such a way: the control point in camera view 2(345') needs to move as little as possible.

Technically, this can be achieved by calculating the epipolar line 342 given by the new position 341 of the control point in the camera view 1. The control point in camera view 2 is then adjusted to lie on the epipolar line 342, but at a minimum distance from its original position.

Accordingly, a method may be defined comprising:

selecting a first 2D image (e.g., 345) of the space and a second 2D image (e.g., 345') of the space, wherein the first 2D image and the second 2D image have been acquired at camera positions in a predefined positional relationship to each other;

Displaying at least a first 2D image (e.g., 345),

guiding a user to select a first control point (e.g., 340) in a first 2D image (e.g., 345), wherein the first control point (340) corresponds to a control point (e.g., 210) that forms an element (e.g., 210a-200i) of an approximation surface or an exclusion volume or a structure (e.g., 200) that includes the volume;

obtaining, from a user, a selection associated with a new location (e.g., 341) of a first control point in a first 2D image (e.g., 345);

restricting the new location in space of the control point (e.g., 210) to a location on the epi-polar line (e.g., 342) in the second 2D image (e.g., 345') that is associated with the new location (e.g., 341) of the first control point in the first 2D image (e.g., 345) and determining the new location in space to be the location (e.g., 341') of the second control point on the epi-polar line (e.g., 342) that is closest to the initial location (e.g., 340') in the second 2D image (e.g., 345'),

wherein the second control point (e.g., 340') corresponds to the same control point (e.g., 210) of the elements (e.g., 200a-200i) of the same structure (e.g., 200) as the first control point (e.g., 340),

to define the movement of elements (e.g., 200a-200i) of a structure (e.g., 200).

15.6 general techniques for interacting with a user

With reference to the above example, the following process may be performed:

first, a coarse localization is performed (e.g. at step 34) from a plurality of views (e.g. from 2D images 313a-313 e);

then, at least one image (e.g., at least one of the virtual image 314 and/or the images 313a-313 f) is visualized or rendered;

the user performs a visual inspection to identify erroneous regions (e.g. 314 in the virtual image 314, or any of the regions 316d, 316e, 316f', 316f in the images 313a-313f ") (alternatively, the inspection may be performed automatically);

for example, the user understands that region 316f "erroneously contributes to the rendering or display of region 316 by understanding, according to image semantics, that region 316 f" never contributes to the displayed object in region 316 (which may result, for example, from one of the problems associated with occluding objects shown in fig. 5 and 6);

to cope with, the user (e.g. at step 35 or 351) sets some constraints, such as:

at least one approximation surface (e.g., 92, 32, 142, 182, 372b, etc.), such as for the object shown in region 316f ″ (in some examples, the approximation surface may automatically generate a containment volume, as in the examples of fig. 18 and 20-24; in some examples, the approximation surface may be associated with other values, such as the tolerance value t in the example of fig. 14 _o) (ii) a And/or

At least one inclusion volume (e.g., 86, 96, 136a-136f, 376b, etc.), such as for objects shown in region 316f "(in some examples, as shown in fig. 37, the definition of the inclusion volume 376 of one camera may result in the automatic creation of an exclusion volume 379); and/or

At least one exclusion volume (e.g., 299a-299e, 379, etc.), for example, between the object shown in region 316f "and another object;

for example, the user may manually create an inclusion volume corresponding to regions 316f ' and 316f "(e.g., around an object in regions 316f ' and 316 f"), or an exclusion volume between regions 316f ' and 316f ", or an approximation surface corresponding to regions 316f ' and 316 f" (e.g., within an object in regions 316b ' and 316b ");

after the constraints have been set, the range of candidate spatial locations is limited to within the limited range of acceptable candidate spaces (e.g., at step 351) (e.g., the region between regions 316f' and 316f "may be excluded from the limited range of acceptable candidate spatial locations);

subsequently, the similarity measure may be processed only within a limited range of acceptable candidate spatial positions in order to more correctly locate the object elements in the regions 316f' and 316f "(and in other regions associated with the marker region 316) (alternatively they may have been stored in a previous run and may now be reused);

Thus, the depth maps of images 313f and 314 are updated;

the procedure can be restarted or, if the user is satisfied, the obtained depth map and/or position fix can be stored.

16 advantages and further examples of the proposed solution

See section 6

Applicable to a broad range of depth estimation procedures

Derivation of multiple constraint types (depth Range, Normal Range) from the same user input

User constraints are generated using standard 3D graphics software. In these ways, the user gets a very powerful tool set to draw user constraints

16.1 other examples

In general, the strategies discussed above may be implemented to refine previously obtained depth maps (or at least previously located object elements).

For example, the following iterative procedure may be implemented:

-providing a depth map, or at least one positioning of an object element (e.g. 93), according to the previous steps;

-measuring reliability (or confidence) using the depth map or the located object elements (e.g. as described in [17 ]);

if the reliability is below a predetermined threshold C0 (the positioning is not satisfactory), a new iteration is performed, for example by:

deriving a range or interval of candidate spatial positions (e.g., 95) of the imaging spatial element (e.g., 93) based on the predefined positional relationship;

Restricting (e.g., 35, 36) the range or interval of candidate spatial locations to at least one restricted range or interval (e.g., 93a) of acceptable candidate spatial locations, wherein the restricting includes at least one of the following constraints:

■ use at least one containment volume (e.g., 96) surrounding at least one determined object (e.g., 91) to limit the range or interval of candidate spatial locations (e.g., the containment volume may be manually defined by a user); and/or

■ use at least one exclusion volume (e.g., manually defined or as defined in FIG. 37) that includes unacceptable candidate spatial locations to limit the range or interval of candidate spatial locations; and/or

■ define (e.g., manually by a user) at least one approximation surface (e.g., 142) and a tolerance interval (e.g., t₀) So as to limit at least one range or interval of candidate spatial positions to a limited range or interval of candidate spatial positions (e.g., 147) defined by a tolerance interval having:

a distal end (e.g., 143 "') defined by at least one approximation surface (e.g., 142); and

a near end (e.g., 147') defined based on a tolerance interval; and

■ based on the similarity measure, the most suitable candidate spatial location is retrieved among the acceptable candidate spatial locations within the limited range or interval

Based on the similarity metric, the most suitable candidate spatial location (e.g., 93) is retrieved (e.g., 37) among the acceptable candidate spatial locations of the restricted range or interval (e.g., 93 a).

The method may be repeated (as in fig. 3) to increase reliability. Thus, the previous coarse depth map may be refined (e.g., even if obtained using another method). The user may create constraints (e.g., approximation surfaces and/or inclusion volumes and/or exclusion volumes) and may remedy after visual inspection by indicating incorrect regions in previously obtained depths that need to be processed with greater reliability. The user can easily insert constraints that are then automatically processed in the manner described above. Thus, it is not necessary to refine the entire depth map, but the similarity analysis can simply be limited to some parts of the depth map, thereby reducing the computational effort for refining the depth map.

Thus, the above method is an example of a method for locating, in a space containing at least one determined object, an object element associated with a particular 2D representation element in a determined 2D image of the space, the method comprising:

Obtaining a spatial position of an imaging space element;

obtaining a reliability value or an unreliability value for the spatial position of the imaging space element;

the method of any of the preceding claims is performed in case the reliability value does not comply with a predefined minimum reliability or the unreliability does not comply with a predefined maximum unreliability.

The iterative method can also be implemented based on the example of fig. 14. For example, a user may select an approximation surface 142 for object 141 and define a predetermined tolerance value t₀Which will then determine a limited range or interval 147 of acceptable candidate locations. As can be understood from fig. 14, the section 147 (and the tolerance value t)₀) Should be large enough to encompass the real surface of object 141. However, there is a risk that the interval 147 is too large to contain another unexpected different object. Thus, if the results are not satisfactory, there may be an iterative solution in which the capacity is reduced (or otherwise increased)The difference value. In some examples, the authenticity may be verified (e.g., as [17 ]]Of) does not meet a predefined minimum reliability or does not meet a predefined maximum unreliability, and different tolerance values are used (e.g. the tolerance value t is increased or decreased) ₀)。

In general, the above method may allow obtaining a method for refining, in a space containing at least one determined object, the positioning of an object element associated with a particular 2D representation element in a determined 2D image of the space, the method comprising the method according to any of the preceding claims, wherein at least one of the at least one containing volume, the at least one approximating surface and the at least one approximating surface is selected by a user when limited.

17 further examples

In general, examples can be implemented as a computer program product having program instructions operable to perform one of the methods when the computer program product is run on a computer. The program instructions may be stored on a machine-readable medium, for example.

Other examples include a computer program stored on a machine-readable carrier for performing one of the methods described herein. In other words, an example of a method is therefore a computer program with program instructions for carrying out one of the methods described herein, when the computer program runs on a computer.

Thus, another example of a method is a data carrier medium (or digital storage medium, or computer readable medium) comprising a computer program recorded thereon for performing one of the methods described herein. The data carrier medium, digital storage medium or recording medium is tangible and/or non-transitory, rather than an intangible and transitory signal.

Thus, another example of the method is a data stream or a signal sequence representing a computer program for performing one of the methods described herein. The data stream or the signal sequence may be transmitted, for example, via a data communication connection, for example via the internet.

Another example includes a processing apparatus, such as a computer, or a programmable logic device performing one of the methods described herein.

Another example includes a computer having installed thereon a computer program for performing one of the methods described herein.

Another example includes an apparatus or system that transmits (e.g., electronically or optically) a computer program for performing one of the methods described herein to a receiver. For example, the receiver may be a computer, mobile device, storage device, or the like. For example, the apparatus or system may comprise a file server for transmitting the computer program to the receiver.

In some examples, a programmable logic device (e.g., a field programmable gate array) may be used to perform some or all of the functions of the methods described herein. In some examples, a field programmable gate array may cooperate with a microprocessor to perform one of the methods described herein. In general, the methods may be performed by any suitable hardware device.

The above examples are merely illustrative of the principles discussed above. It is to be understood that modifications and variations of the arrangements and details described herein will be apparent. It is therefore intended that the scope of the claims be limited not by the specific details presented by way of description and explanation of examples herein.

The same or equivalent elements or elements having the same or equivalent functions are denoted by the same or equivalent reference numerals in the following description even if they appear in different drawings.

Reference 17

[1]H.Yuan,S.Wu,P.An,C.Tong,Y.Zheng,S.Bao,and Y.Zhang,“Robust Semiautomatic 2D-to-3D Conversion with Welsch M-Estimator for Data Fidelity,”Mathematical Problems in Engineering,vol.2018.p.15,2018。

[2]D.Donatsch,N.Farber,and M.Zwicker,“3D conversion using vanishing points and image warping,”in 3DTV-Conference:The True Vision-Capture,Transmission and Dispaly of 3D Video(3DTV-CON),2013,2013,pp.1–4。

[3]X.Cao,Z.Li,and Q.Dai,“Semi-automatic 2D-to-3D conversion using disparity propagation,”IEEE Transactions on Broadcasting,vol.57,no.2,pp.491–499,2011。

[4]S.Knorr,M.Hudon,J.Cabrera,T.Sikora,and A.Smolic,“DeepStereoBrush:Interactive Depth Map Creation,”International Conference on 3D Immersion,Brussels,Belgium,2018。

[5]C.Lin,C.Varekamp,K.Hinnen,and G.de Haan,“Interactive disparity map post-processing,”in Second International Conference on 3D Imaging,Modeling,Processing,Visualization and Transmission(3DIMPVT),2012,2012,pp.448–455。

[6]M.O.Wildeboer,N.Fukushima,T.Yendo,M.P.Tehrani,T.Fujii,and M.Tanimoto,“A semi-automatic depth estimation method for FTV,”Special Issue Image Processing/Coding and Applications,vol.64,no.11,pp.1678–1684,2010。

[7]S.D.Cohen,B.L.Price,and C.Zhang,“Stereo correspondance smoothness tool,”US 9,208,547 B2,2015。

[8]K.-K.Kim,“Apparatus and Method for correcting disparity map,”US 9,208,541 B2,2015。

[9]Michael Bleyer,Christoph Rhemann,Carsten Rother,“PatchMatch Stereo-Stereo Matching with Slanted Support Windows”,BMVC 2011 http:// dx.doi.org/10.5244/C.25.14。

[10]https://github.com/alicevision/MeshroomMaya。

[11]Heiko Hirschmüller and Daniel Scharstein,“Evaluation of Stereo Matching Costs on Images with Radiometric Differences”,IEEE Transactions on pattern analysis and machine intelligence,August 2008。

[12]Johannes L.Jan-Michael Frahm,“Structure-from-Motion Revisited”,Conference on Computer Vision and Pattern Recognition(CVPR),2016。

[13]Johannes L.Enliang Zheng,Marc Pollefeys,Jan-Michael Frahm,“Pixelwise View Selection for Unstructured Multi-View Stereo”,ECCV 2016:Computer Vision–ECCV 2016 pp 501-518。

[14]Blender,“Blender Renderer Passes”,https://docs.blender.org/ manual/en/latest/render/blender_render/settings/passes.html,accessed 18.12.2018。

[15]Glyph,The Glyph Mattepainting Toolik,http://www.glyphfx.com/ mptk.html,accessed 18.12.2018。

[16]Enwaii,“Photogrammetry software for VFX–Ewasculpt”,http:// www.banzai-pipeline.com/product_enwasculpt.html,accessed 18.12.2018。

[17]Ron Op het Veld,Joachim Keinert,“Concept for determining a confidence/uncertainty measure for disparity measurement”,EP18155897.4。

123页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：在增强和/或虚拟现实环境中确定照明设计偏好

Element localization in space

相关技术

网友询问留言