Method, device and system for estimating sub-pixel positions of extreme points in an image

文档序号：1738162 发布日期：2019-12-20 浏览：6次中文

阅读说明：本技术 用于估计图像中极值点的子像素位置的方法、设备和系统 (Method, device and system for estimating sub-pixel positions of extreme points in an image ) 是由 M·尼尔森于 2019-06-06 设计创作，主要内容包括：本发明公开了用于估计图像中极值点的子像素位置的方法、设备和系统。具体地,提供一种使用参数函数估计图像中的极值点的子像素位置的方法。将参数函数局部拟合到图像中的邻近像素组,并识别参数函数的极值点的空间位置。如果参数函数的极值点与图像中的极值点的类型不同,或者参数函数的极值点的位置位于由邻近像素组中的像素位置所定义的区域的外部,则从邻近像素组中删除像素,并重复参数函数的拟合。最后,子像素位置被估计为参数函数的极值点的位置。估计的子像素位置的不确定性水平被估计为在达成子像素位置的估计之前所需的重复的次数。(The invention discloses a method, a device and a system for estimating the sub-pixel position of an extreme point in an image. In particular, a method of estimating a sub-pixel position of an extreme point in an image using a parametric function is provided. The parametric function is locally fitted to a group of neighboring pixels in the image and the spatial position of the extreme point of the parametric function is identified. If the extreme point of the parameter function is of a different type than the extreme point in the image, or the position of the extreme point of the parameter function is outside the area defined by the pixel positions in the set of adjacent pixels, the pixel is deleted from the set of adjacent pixels and the fitting of the parameter function is repeated. Finally, the sub-pixel position is estimated as the position of the extreme point of the parametric function. The level of uncertainty of the estimated sub-pixel position is estimated as the number of repetitions required before an estimate of the sub-pixel position is reached.)

1. A method performed in a device (100) for estimating a sub-pixel position of an extreme point in an image (400) using a parametric function (404) in the presence of noise, the type of the extreme point in the image (400) being a maximum or a minimum, the method comprising:

a) selecting (S02) a group of neighboring pixels (402) in the image (400), wherein the number of pixels in the group of neighboring pixels (402) is larger than the number of parameters for defining the parametric function (404);

b) fitting (S04) the parametric function (404) to pixel values (403) of the group of neighboring pixels (402), wherein the parametric function (404) approximates the pixel values (403) of the group of neighboring pixels (402) as a function of spatial position;

c) identifying (S06) a spatial position (405b, 405d) of an extreme point (406) of the parametric function (404), the type of the extreme point (406) of the parametric function (404) being a maximum, a minimum or a saddle point;

d) checking (S08a, S08b) whether the extreme point (406) of the parametric function (404) is of the same or different type as the extreme point in the image (400) and whether the location (405b, 405d) of the extreme point of the parametric function (404) is located inside or outside an area (407) defined by pixel locations of the group (402) of neighboring pixels in the image (400), and

if the extreme point (406) of the parametric function (404) is of a different type than the extreme point in the image (400) or the location (405b, 405d) of the extreme point (406) of the parametric function (404) is outside the region (407):

deleting (S08c) pixels from the group of neighboring pixels (402), and

repeating steps b), c) and d) if the number of pixels in the group of neighboring pixels (402) is still greater than or equal to the number of parameters defining the parametric function (404);

e) estimating (S10) the sub-pixel position of the extreme point in the image (400) as the spatial position (405d) of the extreme point (406) of the parametric function (404); and

f) associating (S12) the estimated sub-pixel position with an uncertainty level corresponding to the number of iterations of steps b), c) and d).

2. The method of claim 1, wherein,

selecting the group of neighboring pixels (402) as comprising pixels (402a) of the image (400) having a higher pixel value (403a) than the pixel value of each neighboring pixel of the image (400) if the extreme point in the image (400) is a maximum value, and

selecting the group of neighboring pixels (402) to include pixels in the image (400) having a lower pixel value than the pixel value of each neighboring pixel in the image if the extreme point in the image (400) is a minimum value.

3. The method of claim 1, wherein,

step b) comprises solving a system of equations to find parameters for defining the parametric function (404), wherein solving a system of equations comprises forming linear combinations of pixel values (403) of the set of neighboring pixels (402) to find parameters for defining the parametric function (404), wherein coefficients of the linear combinations are pre-stored in the device (100).

4. The method of claim 1, further comprising:

processing the estimated sub-pixel locations, wherein an uncertainty level associated with the estimated sub-pixel locations is used to weight the estimated sub-pixel locations during processing, wherein a higher uncertainty level corresponds to a lower weight than a lower uncertainty level.

5. The method of claim 1, wherein,

the image (400) corresponds to a correlation map (207) generated by matching pixel values in a neighborhood of pixels in a first image (203) of a stereoscopic image pair with pixel values in a second image (205) of the stereoscopic image pair such that estimated sub-pixel positions of the extreme points in the images correspond to sub-pixel positions (504a, 504b, 504c, 504d) in the second image (205) that give a best match to pixels (502a, 502b, 502c, 502d) in the first image (203).

6. The method of claim 5, further comprising: for each of a plurality of pixels (502a, 502b, 502c, 502d) in the first image (203) of the stereoscopic image pair:

generating a correlation map (207) corresponding to the pixel (502a, 502b, 502c, 502d) by matching pixel values in a neighborhood of the pixel with pixel values in the second image (205),

performing steps a) to f) for the correlation map (207) corresponding to the pixel (502a, 502b, 502c, 502d) in order to estimate a sub-pixel position (504a, 504b, 504c, 504d) in the second image (205) that gives the best match to the pixel (502a, 502b, 502c, 502d), the sub-pixel position (504a, 504b, 504c, 504d) in the second image (205) being associated with an uncertainty level.

7. The method of claim 6, further comprising:

processing estimated sub-pixel locations (504a, 504b, 504c, 504d) corresponding to a plurality of pixels (502a, 502b, 502c, 502d) in the first image (203) of the stereoscopic image pair, wherein an uncertainty level associated with the estimated sub-pixel locations (504a, 504b, 504c, 504d) is used as a weight during processing, wherein a quantity calculated from sub-pixel locations associated with a higher uncertainty level is given a lower weight than a quantity calculated from sub-pixel locations associated with a lower uncertainty level.

8. The method of claim 7, wherein,

the quantities calculated from a sub-pixel position (504a, 504b, 504c, 504d) comprise a disparity value, which is calculated as the difference between the sub-pixel position (504a, 504b, 504c, 504d) and the position of the corresponding pixel (502a, 502b, 502c, 502d) in the first image.

9. The method of claim 7, wherein,

the quantity calculated from a sub-pixel position (504a, 504b, 504c, 504d) comprises a depth value calculated on the basis of the sub-pixel position (504a, 504b, 504c, 504d) and the position of the corresponding pixel (502a, 502b, 502c, 502d) in the first image, wherein the depth value corresponds to a distance to an object in a scene depicted by the pixel in the first image.

10. The method of claim 9, wherein the processing further comprises:

computing a weighted average of depth values corresponding to the plurality of pixels (502a, 502b, 502c, 502d) in the first image of the stereoscopic image pair, wherein depth values computed from sub-pixel locations (504a, 504b, 504c, 504d) having a higher level of uncertainty are given a lower weight than depth values computed from sub-pixel locations (504a, 504b, 504c, 504d) having a lower level of uncertainty.

11. The method of claim 8, wherein,

the quantity calculated from a sub-pixel position (504a, 504b, 504c, 504d) comprises a point (506a, 506b, 506c, 506d) in three-dimensional space, wherein the point (506a, 506b, 506c, 506d) in the three-dimensional space is calculated based on the sub-pixel position (504a, 504b, 504c, 504d) and the position of the corresponding pixel (502a, 502b, 502c, 502d) in the first image.

12. The method of claim 11, wherein the plurality of pixels (502a, 502b, 502c, 502d) in the first image of the stereoscopic image pair depict a same object 500 in the scene, the processing further comprising:

computing a plurality of points (506a, 506b, 506c, 506d) in three-dimensional space corresponding to the plurality of pixels (502a, 502b, 502c, 502d) in the first image of the stereoscopic image pair, each point (506a, 506b, 506c, 506d) in three-dimensional space being computed using the location of the corresponding pixel (502a, 502b, 502c, 502d) in the first image and the sub-pixel location (504a, 504b, 504c, 504d) in the second image that gives the best match to that pixel in the first image,

fitting a three-dimensional object template (508) to the plurality of points (506a, 506b, 506c, 506d) in three-dimensional space, the three-dimensional object template defining contours of objects of the same type as the objects in the scene,

wherein in the step of fitting the three-dimensional object template (508), points (506a, 506b, 506c, 506d) in three-dimensional space calculated from sub-pixel positions (504a, 504b, 504c, 504d) having a higher uncertainty level are given a lower weight than points (506a, 506b, 506c, 506d) in three-dimensional space calculated from sub-pixel positions (504a, 504b, 504c, 504d) having a lower uncertainty level.

13. A device (100) for estimating a sub-pixel position of an extreme point in an image (400) using a parametric function (404) in the presence of noise, the type of the extreme point in the image (400) being a maximum or a minimum, the device (100) comprising a processor (102), the processor (102) being configured to:

a) selecting a group of neighboring pixels (402) in the image (400), wherein the number of pixels in the group of neighboring pixels (402) is larger than the number of parameters for defining the parametric function (404);

b) fitting the parametric function (404) to pixel values (403) of the group of neighboring pixels (402);

c) identifying a location (405b, 405d) of an extreme point (406) of the parameter function (404), the type of the extreme point of the parameter function (404) being a maximum, a minimum, or a saddle point;

d) checking whether the extreme point (406) of the parameter function (404) is of the same or different type as the extreme point in the image (400) and whether the position (405b, 405d) of the extreme point of the parameter function (404) is inside or outside a region (407) defined by pixel positions of the group (402) of neighboring pixels in the image (400), and

if the extreme point (406) of the parametric function is of a different type than the extreme point in the image (400) or the location (405b, 405d) of the extreme point (406) of the parametric function (404) is outside the region (407):

deleting pixels from said group of neighboring pixels (402), and

repeating steps b), c) and d) if the number of pixels in the group of neighboring pixels (402) is still greater than or equal to the number of parameters defining the parametric function (404);

e) estimating a sub-pixel position of the extreme point in the image (400) as a position (405d) of the extreme point (406) of the parametric function (404), and

f) associating the estimated sub-pixel position with an uncertainty level corresponding to the number of iterations of steps b), c) and d).

14. A stereo camera system (200), comprising:

a first image sensor (202) configured to capture a first image (203) of a stereoscopic image pair;

a second image sensor (205) configured to capture a second image (205) of the stereoscopic image pair;

the apparatus (100) of claim 13; and

a processor configured to:

generating a correlation map (207) from the stereoscopic image pair by matching pixel values in a neighborhood of pixels in the first image (203) with pixel values in the second image (205) of the stereoscopic image pair, and

providing the correlation map (207) as an input to the apparatus (100) such that the apparatus (100) uses a parametric function to estimate sub-pixel positions of extreme points in the correlation map (207) in the presence of noise.

15. A non-transitory computer readable medium (104) comprising computer code instructions adapted to perform the method of claim 1 when executed by a device (100) having processing capabilities.

Technical Field

The present invention relates to the field of estimating the sub-pixel position of extreme points in an image. In particular, the present invention relates to a method of estimating the sub-pixel position of an extreme point in an image using a parametric function and to an associated device and system.

Background

Digital images are made up of a finite set of digital values called pixels. These are the smallest independent elements in the image. Spatial locations in a digital image can be measured with pixel accuracy. However, in some applications this is not sufficient, but it is desirable to measure spatial positions in an image with sub-pixel accuracy. For example, in many applications it is of great interest to estimate the location of intensity maxima or minima in an image with sub-pixel accuracy. This includes applications within optical flow, object localization from e.g. satellite images or microscope images, and applications in stereo camera settings for estimating depth in a scene.

A method for estimating a location of a maximum or minimum in an image includes: the parametric function is locally fitted to the pixel values of the image. A maximum or minimum of the fitted parametric function may then be identified, and the spatial location of the identified maximum or minimum of the parametric function may be considered as the sub-pixel location of the maximum or minimum in the image. A disadvantage of this method is that it is sensitive to noise in the image. Nor does it provide a measure of the reliability of the estimate. There is room for improvement.

Disclosure of Invention

In view of the above, it is therefore an object of the present invention to provide an improved estimation of the sub-pixel position of the maximum or minimum value in an image.

According to a first aspect, the above object is achieved by a method performed in a device for estimating a sub-pixel position of an extreme point in an image using a parametric function in the presence of noise, the type of extreme point in the image being a maximum or a minimum, the method comprising:

a) selecting a group of neighboring pixels in the image, wherein the number of pixels in the group of neighboring pixels is greater than the number of parameters used to define the parametric function;

b) fitting a parametric function to the pixel values of the group of neighboring pixels, wherein the parametric function approximates the pixel values of the group of neighboring pixels as a function of spatial location;

c) identifying the spatial position of an extreme point of the parameter function, wherein the type of the extreme point of the parameter function is a maximum value, a minimum value or a saddle point;

d) checking whether the extreme point of the parameter function is of the same or different type as the extreme point in the image and whether the position of the extreme point of the parameter function is inside or outside the area defined by the pixel positions of the neighboring pixel groups in the image, and

if the extreme point of the parameter function is of a different type than the extreme point in the image, or the location of the extreme point of the parameter function is outside the region:

deleting pixels from the group of adjacent pixels, an

Repeating steps b), c) and d) if the number of pixels in the group of neighboring pixels is still greater than or equal to the number of parameters used to define the parametric function;

e) estimating the sub-pixel position of the extreme point in the image as the spatial position of the extreme point of the parameter function; and

f) associating the estimated sub-pixel position with an uncertainty level corresponding to the number of iterations of steps b), c) and d).

Thus, according to the method, a parametric function is locally fitted to the pixel values of a group of neighboring pixels of the image, and extreme points of the parametric function are identified. However, before accepting the spatial position of the extreme point of the identified parametric function as an estimate of the sub-pixel position of the extreme point in the image, two checks are performed.

In a first check, it is checked whether the extreme points of the identified parametric function are of the same or different type as the extreme points in the image we are looking for. For example, if we are looking for the sub-pixel location of the maximum in the image, we check if the identified extreme point of the parametric function is the maximum. Similarly, if we are looking for the sub-pixel location of the minimum value in the image, we check if the extreme point of the identified parametric function is the minimum value. The reason for performing the first check is that noise in the image may cause abnormal pixel values. Abnormal pixel values may in turn lead to poor fitting of the parametric function. Finally, the result may be that the parametric function has a minimum value, even if the method searches for a maximum value, and vice versa.

In a second check, it is checked whether the identified extreme point of the parametric function lies inside or outside the region defined by the pixel positions of the group of neighboring pixels in the image. Also, anomalous pixel values may result in a poor fit of the parametric function, such that the identified extreme points are outside the local neighborhood in which the method searches for a maximum or minimum.

Either of the first and second checks may fail if the identified extreme point of the parametric function is of a different type than the extreme point in the image, or if the identified extreme point of the parametric function is outside of the region defined by the set of neighboring pixels. If this occurs, the method proceeds to delete pixels from the set of neighboring pixels and starts again to fit the parametric function to the pixel values of the set of neighboring pixels. This process is repeated until both checks pass, or until there are not enough pixel values remaining in the set of neighboring pixels to allow the parametric function to be fitted. Thus, the method allows for iteratively deleting potentially anomalous pixel values until an acceptable fit is obtained. In this way, the method becomes more robust to noise in the image.

The number of iterations required is typically related to the noise level in the image, i.e. the more noisy the image, the more iterations are typically required to achieve an acceptable fit. Furthermore, for each iteration, the fitting of the parametric function will be based on fewer pixel values. Thus, the uncertainty level of the estimated sub-pixel position tends to increase with the number of iterations. The method therefore proposes to use the number of iterations as a measure of the uncertainty level of the estimated sub-pixel position, i.e. as a measure of the reliability of the estimation.

An image generally refers to any type of spatially organized signal values. The image may be an image captured by a sensor, such as a visible light image, an infrared image, or a thermal image. More generally, however, the image may be any measured or calculated signal value provided on a two-dimensional grid. These signal values may be spatially related, e.g. spatially correlated.

The image includes pixels. Each pixel is associated with a location that corresponds to the location of the pixel on the two-dimensional grid. Each pixel is also associated with a pixel value that corresponds to the signal value of the pixel.

Thus, the pixel locations define discrete locations arranged in a two-dimensional grid. Spatial locations in two-dimensional space that are not limited to these discrete locations are referred to herein as sub-pixel locations.

Extreme points in an image generally refer to maxima or minima in the image. The maximum may be a local maximum. The minimum may be a local minimum.

Extreme points of a parametric function generally refer to the stagnation points of the parametric function, i.e. the points where all partial derivatives (or equivalently, gradients) of the parametric function are zero. The extreme point of the parametric function may be a maximum, a minimum, or a saddle point.

The uncertainty level of an estimate generally refers to a measure of how reliable the estimate is. A lower uncertainty level indicates a more reliable estimate than a higher uncertainty level. The uncertainty level also indicates the variance of the estimate. A higher uncertainty level indicates a higher variance than a lower uncertainty level.

The group of neighboring pixels may be selected based on pixel values of the image. For example, a region in the image may be identified in which the pixel values indicate the presence of a local maximum (if the type of extreme point is a maximum) or a local minimum (if the type of extreme point is a minimum). The group of neighboring pixels may be selected to include such a region. In this way, a rough estimation of the position of the extreme point is first made using the pixel values of the image. The estimated location of the extreme point can then be fine-tuned to sub-pixel accuracy using the methods described above. A pixel in the image having a higher pixel value than the pixel value of each of its neighboring pixels indicates the presence of a local maximum. Similarly, a pixel in the image having a pixel value lower than the pixel value of each of its neighboring pixels indicates the presence of a local minimum. Thus, if the extreme point in the image is a maximum value, the group of adjacent pixels may be selected to include pixels in the image having a higher pixel value than the pixel value of each adjacent pixel in the image, and if the extreme point in the image is a minimum value, the group of adjacent pixels may be selected to include pixels in the image having a lower pixel value than the pixel value of each adjacent pixel in the image. The group of neighboring pixels may be selected to be centered on the pixel with the highest pixel value (if the maximum value is searched for) or the pixel with the lowest pixel value (if the minimum value is searched for). For example, the neighboring pixel group may include a 3 × 3 neighborhood of pixels centered on the pixel having the highest pixel value (if the maximum value is searched for) or the pixel having the lowest pixel value (if the minimum value is searched for).

The parametric function may be a two-dimensional quadratic function. Such a function can be written in the form of the following parameters:

f(x,y)＝Ax²+By¹+Cxy+Dx+Ey+F

the two-dimensional quadratic function is described by 6 parameters. Thus, the parametric function may be fitted to the pixel values of a group of neighboring pixels as long as there are at least 6 pixels in the group of neighboring pixels. An advantage of the parametric function is that it can be fitted to the pixel values of the group of neighboring pixels using a closed form expression. It is therefore a computationally efficient choice. However, it should be understood that other parametric functions may be used while still achieving this advantage.

A least squares method may be used to fit the parameter function to the pixel values of the group of neighboring pixels. This includes minimizing the sum of squared differences between the pixel values of the neighboring pixel group and the parametric function evaluated at the location of each pixel of the neighboring pixel group. This is a computationally efficient method for fitting a parameter function to the pixel values of a group of neighboring pixels, even when the number of pixels in the group of neighboring pixels exceeds the number of parameters in the parameter function.

The fitting of the parametric function to the pixel values of the set of neighboring pixels may involve solving a system of equations to find the parameters defining the parametric function. For each pixel in the set, a system of equations may be defined by equating the value of the pixel with the value of the parametric function evaluated at the location of the pixel. The solution to the system of equations can be found, for example, by using the least squares method described previously. Solving the system of equations may include forming linear combinations of pixel values of the set of neighboring pixels to find parameters that define the parametric function. In order to make the method computationally efficient, the coefficients of the linear combination may be pre-stored in the device. In this way, the coefficients are pre-evaluated and need not be evaluated each time the method is performed.

As described above, the method iteratively deletes pixels from the set of neighboring pixels until an acceptable fit of the parametric function is achieved. Specifically, pixels that may be considered outliers may be deleted. In this way, the effect of noise on the fit may be reduced. This can be done in different ways. For example, in step d), the pixels having pixel values that deviate most from the average value formed from the pixel values of the adjacent pixel group may be deleted. In some cases, the average value may be calculated by omitting the pixel value of the center pixel of the adjacent pixel group. This is suggested by the fact that the center pixel is usually selected as the pixel with the highest (if the maximum value) or lowest (if the minimum value) pixel value. In this way, the average will reflect the average of the pixels surrounding the central pixel.

The proposed method provides not only an estimate of the sub-pixel position, but also an uncertainty level of the estimate. The uncertainty level is given in the form of the number of iterations required in order to pass the two checks defined in step d). The uncertainty level reflects the reliability of the estimate, which increases as the uncertainty level decreases. Thus, the uncertainty level is a measure of how well the estimate of the sub-pixel position can be trusted. The information of the uncertainty level can be used in further processing of the estimated sub-pixel positions. More specifically, the method may further comprise: processing the estimated sub-pixel positions, wherein an uncertainty level associated with the estimated sub-pixel positions is used during the processing to weight the estimated sub-pixel positions or quantities computed therefrom, wherein a higher uncertainty level corresponds to a lower weight than a lower uncertainty level. In this way, the estimated sub-pixel positions or the quantities calculated therefrom can be weighted according to their reliability. In this way, the influence of noise can be reduced during processing.

The proposed method can be used for various applications including object localization from e.g. satellite images or microscope images. In such applications, the image input to the method may be captured by a sensor. For example, the image may be a visible light image, an infrared image, or a thermal image. However, in another set of applications, the image instead corresponds to computed signal values provided on a two-dimensional grid.

Examples of such applications relate to object detection. In such an application, the image signal value may correspond to a score output from the object detector. The score may reflect a probability that an object is present at a pixel location in the image. By applying the proposed method to an image with a score from an object detector, the position of an object in the image can be determined with sub-pixel accuracy. The processing of the determined sub-pixel location may correspond to smoothing the location using the uncertainty level as a weight.

Another example of such an application relates to stereo cameras. In a stereo camera, a first sensor and a second sensor each capture an image of a scene, but from slightly different perspectives. By finding matching features in both images, it is possible, for example, to calculate the depth in the scene, i.e. the distance to objects in the scene. The proposed method may be used in a process of matching features between a first image and a second image in a stereo image pair. In particular, it can be used to find the location of matching features and associated uncertainty levels with sub-pixel accuracy.

In a stereo application, the images input into the proposed method may correspond to the correlation map. A correlation map may be generated by matching (e.g., correlating) pixel values in a neighborhood of a pixel in a first image of a stereoscopic image pair with pixel values in a second image of the stereoscopic image pair. Thus, the correlation map is defined on a two-dimensional grid corresponding to the pixel locations in the second image. Further, the signal value of the correlation map indicates a degree of matching of the value of a specific pixel in the first image with the pixel value of the second image. Thus, when the proposed method takes the correlation map as input, the sub-pixel positions of the extreme points in the image correspond to the sub-pixel positions in the second image that give the best match to the pixels in the first image.

Thus, the method estimates the sub-pixel location in the second image that gives the best match to the particular pixel in the first image in the stereo pair and the associated uncertainty level. This process may be repeated for a plurality of pixels in the first image of the stereo pair. More specifically, the method may further comprise: for each of a plurality of pixels in a first image of a stereoscopic image pair: generating a correlation map corresponding to the pixel by matching pixel values in a neighborhood of the pixel with pixel values in the second image, performing steps a) to f) on the correlation map corresponding to the pixel to estimate a sub-pixel position in the second image that gives a best match with the pixel, the sub-pixel position in the second image being associated with an uncertainty level. In this manner, the sub-pixel locations in the second image and the associated uncertainty levels are estimated for each of a plurality of pixels in the first image.

The plurality of pixels may correspond to all pixels in the first image. The plurality of pixels may correspond to a designated area in the first image. Alternatively or additionally, multiple pixels in the first image of a stereoscopic image pair may depict the same object in the scene.

The method may further include processing estimated sub-pixel locations corresponding to a plurality of pixels in a first image of the stereoscopic image pair. The uncertainty level associated with the estimated sub-pixel position may be used as a weight during processing, wherein quantities calculated from sub-pixel positions associated with higher uncertainty levels are given lower weights than quantities calculated from sub-pixel positions associated with lower uncertainty levels. In this way, less reliable matches between the first and second images may be given lower weight than more reliable matches. In this way, the influence of noise can be reduced.

The parameter calculated from the sub-pixel positions may comprise a disparity value, the disparity value being calculated as the difference between the sub-pixel position and the position of the corresponding pixel in the first image. Therefore, different disparity values may be given different weights according to their reliability. This will eventually reduce the effect of noise. For example, the processing may include calculating a weighted average of disparity values calculated from estimated sub-pixel locations, where disparity values calculated from sub-pixel locations associated with higher levels of uncertainty are given lower weight than disparity values calculated from sub-pixel locations associated with lower levels of uncertainty. This may be useful, for example, for smoothing the disparity map.

The parameter computed from the sub-pixel locations may comprise depth values computed based on the sub-pixel locations and locations of corresponding pixels in the first image, wherein the depth values correspond to distances to objects in a scene depicted by the pixels in the first image. Thus, different depth values may be given different weights according to their reliability. This will eventually reduce the effect of noise.

For example, processing estimated sub-pixel locations corresponding to a plurality of pixels in a first image of a stereoscopic image pair may include: for each of a plurality of pixels in a first image of a stereoscopic image pair: a depth value for the pixel is calculated based on the location of the pixel in the first image and the sub-pixel location in the second image that gives the best match with the pixel, and the depth value is associated with the level of uncertainty associated with the sub-pixel location in the second image. In this way, the computed depth values may be associated with uncertainty levels, providing a measure of the degree of reliability of the depth values. When processing depth values, the uncertainty level associated with the depth value may be used. For example, depth values with a lower level of uncertainty may be given a higher weight than depth values with a higher level of uncertainty.

The processing may further include: a weighted average of depth values corresponding to a plurality of pixels in a first image of the stereoscopic image pair is calculated, wherein depth values calculated from sub-pixel locations having a higher level of uncertainty are given a lower weight than depth values calculated from sub-pixel locations having a lower level of uncertainty. In this way, the depth values are weighted according to their reliability. In this way, the effect of noise on the depth values is reduced. This may be used, for example, in calculating the depth to a specified region of the image. This may also be useful for the purpose of smoothing the depth map. Thus, a more reliable depth value has a greater impact on the final result than a less reliable depth value.

A weighted average of depth values may be calculated by applying a spatial filter to depth values corresponding to a plurality of pixels in a first image of a stereoscopic image pair. The spatial filter may be a smoothing filter that smoothes the depth values. The parameters of the spatial filter may be set according to the uncertainty level.

The parameter calculated from the sub-pixel positions may comprise a point in three-dimensional space, wherein the point in three-dimensional space is calculated based on the sub-pixel positions and the positions of the corresponding pixels in the first image. Thus, different points in three-dimensional space may be given different weights depending on their reliability. This will eventually reduce the effect of noise.

In some applications, it is of interest to estimate the shape of a three-dimensional object in a scene based on a stereo image pair depicting the scene. This can be achieved by: identifying a plurality of pixels depicting an object in a first image of a stereoscopic image; calculating points in a three-dimensional space corresponding to the plurality of pixels based on a stereo matching result with a second image of the stereo images; and fitting the object template to the calculated points in three-dimensional space. By using the proposed method, the estimated uncertainty level can be used as a weight in the fitting of the object template, making the fitting less sensitive to noise. In more detail, a plurality of pixels in a first image of a stereoscopic image pair may depict a same object in a scene, and the processing may further include:

a plurality of points in three-dimensional space corresponding to a plurality of pixels in a first image of the stereoscopic image pair are calculated, each point in three-dimensional space being calculated using the position of the corresponding pixel in the first image and the sub-pixel position in the second image that gives the best match to that pixel in the first image.

Fitting a three-dimensional object template to a plurality of points in three-dimensional space, the three-dimensional object template defining contours of objects of the same type as the objects in the scene,

wherein, in the step of fitting the three-dimensional object template, points in three-dimensional space calculated from sub-pixel positions having a higher level of uncertainty are given a lower weight than points in three-dimensional space calculated from sub-pixel positions having a lower level of uncertainty.

According to a second aspect, the above object is achieved by an apparatus for estimating a sub-pixel position of an extreme point in an image using a parametric function in the presence of noise, the type of extreme point in the image being a maximum or a minimum, the apparatus comprising a processor configured to:

b) fitting a parametric function to pixel values of the group of neighboring pixels;

c) identifying the position of an extreme point of the parameter function, wherein the type of the extreme point of the parameter function is a maximum value, a minimum value or a saddle point;

if the extreme point of the parameter function is of a different type than the extreme point in the image, or the location of the extreme point of the parameter function is outside the region:

deleting pixels from the group of adjacent pixels, an

Repeating steps b), c) and d) if the number of pixels in the group of neighboring pixels is still greater than or equal to the number of parameters used to define the parametric function;

e) estimating the sub-pixel position of the extreme point in the image as the position of the extreme point of the parameter function; and

f) associating the estimated sub-pixel position with an uncertainty level corresponding to the number of iterations of steps b), c) and d).

According to a third aspect, there is provided a stereoscopic camera system comprising:

a first image sensor configured to capture a first image of a stereoscopic image pair;

a second image sensor configured to capture a second image of the stereoscopic image pair;

the apparatus according to the second aspect; and

a processor configured to:

generating a correlation map from the stereoscopic image pair by matching pixel values in a neighborhood of pixels in a first image of the stereoscopic image pair with pixel values in a second image, and

the correlation map is provided as an input to the apparatus such that the apparatus uses a parametric function to estimate the sub-pixel positions of the extreme points in the correlation map in the presence of noise.

According to a fourth aspect, there is provided a non-transitory computer readable medium comprising computer code instructions adapted to perform the method of the first aspect when executed by a device having processing capabilities.

The second, third and fourth aspects may generally have the same features and advantages as the first aspect. It should also be noted that the present invention relates to all possible combinations of these features, unless explicitly stated otherwise. The steps of any method disclosed herein need not be performed in the exact order disclosed, unless explicitly stated.

Drawings

The above and other objects, features and advantages of the present invention will be better understood by the following illustrative and non-limiting detailed description of embodiments thereof with reference to the accompanying drawings, in which like reference numerals will be used for like elements, and in which:

fig. 1 shows a device for estimating sub-pixel positions of extreme points in an image according to an embodiment.

Fig. 2 shows a stereo camera system according to an embodiment.

Fig. 3 is a flow chart of a method for estimating a sub-pixel position of an extreme point in an image according to an embodiment.

FIG. 4a shows a group of neighboring pixels in an image according to an embodiment.

Fig. 4b-d show a fitting of a parametric function to pixel values of the group of neighboring pixels shown in fig. 4a according to an embodiment.

Fig. 5 schematically shows a stereoscopic image pair depicting an object in a scene according to an embodiment.

Detailed Description

The present invention now will be described more fully hereinafter with reference to the accompanying drawings, in which embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided for the sake of completeness and to fully convey the scope of the invention to the skilled person. The systems and devices disclosed herein during operation will be described.

Fig. 1 shows an apparatus 100 for estimating the sub-pixel positions of extreme points in an image. The device 100 includes a processor 102. The processor may be of any known type, such as a central processing unit, a microprocessor, or a digital signal processor. The device 100 also includes a memory 104. The memory 104 may be a non-transitory computer readable medium, such as a non-volatile memory. The computer code instructions may be stored in the memory 104. The computer code instructions, when executed by the processor 102, cause the processor 102 to perform any of the methods disclosed herein. In particular, the processor 102 is caused to perform a method for estimating the sub-pixel position of the extreme point in the image input to the device 100.

Fig. 2 shows a system 200 in which the device 100 may be used. The system 200 is a stereo camera system 200. The system 200 comprises a first image sensor 202, a second image sensor 204, a matching component 206 and the apparatus 100 of fig. 1 for estimating a sub-pixel position of an extreme point in an image.

The first image sensor 202 and the second image sensor 204 are arranged to capture images of a scene from different perspectives simultaneously. A first image 203 captured by the first sensor 202 and a second image 205 captured by the second sensor 204 form a stereo image pair. The first image sensor 202 and the second image sensor 204 may be image sensors of any known stereo camera. For example, they may be part of an AXIS P8804 stereo sensor suite.

The first image sensor 202 and the second image sensor 204 are operatively connected to a matching component 206. Specifically, a first image 203 and a second image 205 captured by a first sensor 202 and a second sensor 204 are provided as inputs to a matching component 206. The matching component 206 comprises a processor. The processor may be of any known type, such as a central processing unit, a microprocessor, or a digital signal processor. The matching component 206 can also include a memory, which can be a non-transitory computer readable medium, such as a non-volatile memory. The memory of the matching component 206 may store computer code instructions. These computer code instructions, when executed by the processor of the matching component 206, cause the processor to match pixel values in the first image 203 with pixel values in the second image 205 to generate and output a correlation map 207.

The matching component 206 is operatively connected to the device 100. Specifically, the correlation graph 207 generated by the matching component 206 is input to the device 100. Thus, the device 100 is arranged to estimate the sub-pixel positions of the extreme points in the correlation map.

The matching component 206 may be integrated in the device 100. In particular, the processor 102 and the memory 104 may be configured to both match pixel values in the first image 203 and the second image 205 to generate a correlation map, and then estimate sub-pixel locations of extreme points in the correlation map.

The operation of the device 100 will be explained in more detail below with reference to the flowcharts of fig. 1, 4 and 3.

An image is input to the device 100. As will be explained, the apparatus 100 processes an image to provide an estimate of the sub-pixel locations of extreme points in the image and a level of uncertainty in the estimate. Hereinafter, it is assumed that the extreme point in the image is the maximum value. However, it should be understood that the extreme points in the image may equally be minimum values.

In step S02, the processor 102 selects a set of neighboring pixels in the image. This is further illustrated in fig. 4a, which shows an image 400 and a selected group of neighboring pixels 402. The illustrated group of adjacent pixels 402 includes 3 × 3 pixels, but it should be understood that larger groups of adjacent pixels may be selected, such as 5 × 5 pixels or 7 × 7 pixels. To select the set of neighboring pixels 402, the processor 102 may identify one or more local maxima in the image 400. The local maxima may be identified as pixels in the image 400 having pixel values greater than the pixel values of each of the neighboring pixels. Such local maxima are shown in the right part of fig. 4a, which shows that pixel 402a has a pixel value 403a which exceeds the pixel values of its 8 neighbors. The set of neighboring pixels 402 may be selected to include pixels identified as local maxima. For example, the group of neighboring pixels 402 may be selected such that the center pixel of the group corresponds to the pixel 402a identified as the local maximum. In the example shown, the center pixel of the 3 x 3 pixel neighborhood 402 corresponds to the pixel 402a identified as the local maximum, while the other 8 pixels correspond to the 8 neighbors of the pixel 402a identified as the local maximum. In this example, the center value of the neighboring pixel group 402 is the largest. Assume that pixel values 403 in the neighboring pixel group 402 are measured from peaks in the image, and that the coordinates of the true peak are spatially located between these measurements (e.g., at sub-pixel locations within the neighboring pixel group 402). For the following description, it is assumed without loss of generality that the center pixel of group 402 is located at (0,0) and the other positions are within a range of one. However, it should be understood that other assumptions are possible. Thus, the pixel positions in the neighboring pixel group 402 are assumed to be:

(-1,-1)	(0,-1)	(1,-1)
			(-1,0)	(0,0)	(1,0)
(-1,1)	(0,1)	(1,1)

the pixel locations in the set of adjacent pixels 402 define a region 407. Region 407 spans the pixel locations of the pixels in the group. In other words, region 407 includes all spatial locations that fall between the pixel locations of neighboring pixel group 402. Thus, in this case, region 407 includes all spatial locations (x, y), where | x | < 1 and | y | < 1.

Further, in the following description, the pixel value 403 corresponding to the pixel position in the group 402 is represented by:

Z₁	Z₂	Z₃
			Z₄	Z₅	Z₆
z₇	Z₈	Z₉

where more than one local maximum is identified, the set of neighboring pixels 402 may be selected to include the pixel identified as the global maximum (i.e., the pixel having the greatest value in the image). Alternatively or additionally, several groups of neighboring pixels may be selected, each group corresponding to an identified local maximum. The steps described below may then be repeated independently for each selected group of adjacent pixels.

If the extreme point in the image is instead a minimum, the processor 102 may instead identify one or more local minima in the image 400 and select the set of neighboring pixels to include the pixel identified as the local minimum. The local minimum may be identified as a pixel in the image having a lower pixel value than the pixel value of each neighboring pixel.

In step S04, the processor 102 fits a parametric function to the pixel values 403 of the neighboring pixel group 402. The fitting comprises estimating the parameters of the parametric function as a function such that the resulting parametric function approximates the pixel values 403 of the set of neighboring pixels 402 as close as possible (e.g. in a least squares sense) to spatial positions. Fig. 4b shows a parametric function 404, which parametric function 404 has been fitted to the pixel values 403 of the group of neighboring pixels 402. The parametric function may be a two-dimensional quadratic function, but other parametric functions may also be used. Such a function can be written using 6 parameters A, B, C, D, E, F as the following parametric form:

f(x，y)＝Ax²+By²+Cxy+Dx+Ey+F

since the two-dimensional quadratic function is described by 6 parameters, it can be fitted to the pixel values 403 of the neighboring pixel group 402 as long as there are at least 6 pixels in the neighboring pixel group 402.

The processor 102 may estimate the parameters of the parametric function by solving a system of equations represented by the unknown parameters and the pixel values 403 of the set of neighboring pixels 402. For example, all 9 available samples (z) of the 3 × 3 neighboring pixel group 402 are used₁，z₂，z₃，z₄，z₅，z₆，z₇，z₈，z₉) And inserted into the above expression of the two-dimensional quadratic function, the following set of 9 equations can be formulated:

A+B+C-D-E+F＝z₁

B-E+F＝z₂

A+B-C+D-E+F＝z₃

A-D+F＝z₄

F＝z₅

A+D+F＝z₆

A+B-C-D+E+F＝z₇

B+E+F＝z₈

A+B+C+D+E+F＝Z₉

this system of equations is overdetermined because there are more equations than unknowns. To find a solution, a least squares method may be used. This includes minimizing the sum of squared differences between the pixel values 403 of the neighboring pixel group 402 and the parametric function 404 estimated at the location of each pixel of the neighboring pixel group 402. In other words, a solution in the least squares sense can be found by minimizing the least squares target:

Θ(A，B，C，D，E，F)＝A+B+C-D-E+F-z₁)²

+(B-E+F-z₂)²

+(A+B-C+D-E+F-z₃)²

+(A-D+F-z₄)²

+(F-z₅)²

+(A+D+F-z₆)²

+(A+B-C-D+E+F-z₇)²

+(B+E+F-z₈)²

+(A+B+C+D+E+F-z₉)²

by finding the partial derivativesAnd setting each to zero, results in a system of equations that can be solved unambiguously. In this case, the solution is:

therefore, solving the system of equations to find the parameters includes: the pixel values (z) forming the set of adjacent pixels 402₁，z₂，z₃，z₄，z₅，z₆，z₇，z₈，z₉) Linear combinations of (3). The coefficients of the linear combination are preferably pre-calculated and stored in the device 100, for example in the memory 104.

After fitting the parametric function 404, the processThe machine 102 proceeds to identify a spatial location 405b (represented by an "X" in fig. 4 b) of an extreme point 406 of the parametric function 404. The extreme points of the parametric function 404 are the partial derivativesAnda point equal to zero. In other words, the extreme point refers to the stagnation point of the parametric function 404. In the case where the parameter function 404 is the two-dimensional quadratic function described above, the spatial position (x) of the extreme point thereof^*，y^*) Equal to:

as is well known, the stagnation point of a function of two variables may be a maximum, a minimum or a saddle point. If the method aims at estimating the sub-pixel position of the maximum, the extreme point of the parametric function that is expected to fit is also the maximum. Similarly, if the method instead aims to estimate the sub-pixel position of the minimum, it is desirable that the extreme point of the fitted parametric function is the minimum. Therefore, in step S08a, the processor 102 checks whether the extreme point 406 of the parameter function 404 is of the same type as the extreme point in the image 400. In other words, if the processor 102 intends to estimate the sub-pixel position of the maximum value in the image 400, the processor 102 checks in step S08a whether the spatial position 405 of the extreme point 406 of the parameter function 404 is the maximum value. This check is referred to herein as a maximum check.

Finding the maximum of the function is where the derivative is zero and whenf_xx< 0 and f_yyIf < 0. In this connection, it is possible to use,for the two-dimensional quadratic function, f_xx＝2A、f_xx2B and f_xyC. Thus, for this two-dimensional quadratic function, the processor 102 may check whether the extreme point 406 is the maximum value by checking the following condition:

4AB-C²＞0

2A＜0

2B＜0

the corresponding condition for the minimum isf_xx> 0 and f_yy> 0, which becomes for this two-dimensional quadratic function:

4AB-C²＜0

2A＞0

2B＞0

in the example of fig. 4b, the processor 102 finds that the extreme point 406 is the maximum and is therefore of the same type as the extreme point in the image 400. In the case where the parameter function 404 may have several local extrema, the processor may also check whether the extrema 406 is a single global extrema of the parameter function of the same type as the extrema in the image.

In step S08b, the processor 102 also checks whether the spatial position 405b of the extreme point 406 of the parameter function 404 is inside or outside the region 407 defined by the pixel positions of the group 402 of neighboring pixels in the image 400. This inspection is referred to herein as a location inspection. It should be appreciated that steps S08a and S08b may be performed in any order. As described above, the range [ -1, 1 ] is used for the neighboring pixel group 402]And so the processor 102 may check whether the condition | x is satisfied^*|＜1、|y^*|＜1。

If both the maximum check of step S08a and the localization check of step S08b pass (i.e., if the extreme point 406 of the parametric function 404 is of the same type as the extreme point in the image 400 and the spatial location 405b of the extreme point 406 of the parametric function 404 is inside the region 407), the processor 102 proceeds to step S10.

However, if either of the maximum check of step S08a and the localization check of step S08b fails (i.e., if the extreme point 406 of the parametric function 404 is of a different type than the extreme point in the image 400, and/or the spatial location 405b of the extreme point 406 of the parametric function 404 is outside of the region 407), the processor 102 instead proceeds to step S08 c. This is the case in the example of fig. 4b, because the spatial position 405b of the extreme point 406 is outside the region 407, and therefore the localization check fails. In the event that the extreme point 406 of the parameter function 404 is found not to be a single global maximum (or minimum), the processor 102 may also proceed to step S08 c.

In step S08c, the processor 102 deletes the pixel from the adjacent pixel group. The idea behind this deletion is to delete the outliers and make a new attempt to fit the parametric function. There are many ways to identify pixels with outliers. For example, the processor 102 may delete the pixel having the pixel value that deviates most from the average value formed from the pixel values of the adjacent pixel groups. When forming the average value, the pixel value z5 at the center of the adjacent pixel group 402 may be omitted. More specifically, the array K of length K-8 is [1, 2, 3, 4, 6, 7, 8, 9]Considered as a possible index of the pixel value z to remove outliers therefrom. K found from^*(i) Will be the pixel deleted from the group of neighboring pixels 402.

If it were applied to the example of FIG. 4b, the processor would arrive at k^*(i) 4 and will therefore proceed to delete the pixel value z corresponding to the pixel value with label 403b in fig. 4b₄. Processor 102 may also delete index k from the array^*(i) The array is updated for the next iteration. Thus, at k^*(i) In the case of 4, the new array would be K ═ 1, 2, 3, 6, 7, 8, 9, with length K ═ 7]。

In step S08c, the processor 102 then checks whether the number of remaining pixels in the group of neighboring pixels 402 is still greater than or equal to the number of parameters used to define the parameter function 404.

If this condition is not met, the method terminates because the parametric function cannot be fit. In this case, the method may output the coordinates of the center pixel of the neighboring pixel group 400 as an estimate of the sub-pixel position of the extreme point, i.e., coordinates (0,0) in this example. The associated uncertainty level of the estimated sub-pixel position may then be set to correspond to the number of iterations of steps S04, S06, S08a-d plus 1.

However, if this condition is satisfied, the processor 102 proceeds to repeat the above-described steps S04, S06, and S08 a-d.

In the example of fig. 4b, the number of remaining pixels after deleting pixel 403c is equal to 8. A two-dimensional quadratic function with 6 parameters can be fitted. Thus, the processor 102 proceeds to repeat step S08a-d a second time, although the pixel value 403b has now been deleted. This means that one of the equations in the system of equations will be deleted, which equation comprises zk: (b:)_i). E.g. at k^*(i) In the case of 4, the least squares objective to be minimized now becomes:

Θ(A，B，C，D，E，F)＝(A+B+C-D-E+F-z₁)²

+(B-E+F-z₂)²

+(A+B-C+D-E+F-z₃)²

+(F-z₅)²

+(A+D+F-z₆)²

+(A+B-C-D+E+F-z₇)²

+(B+E+F-z₈)²

+(A+B+C+D+E+F-z₉)²

also, the system of equations can be solved explicitly in the same way as previously described. At this stage, in the case where one pixel is deleted, 8 pixels remain in the adjacent pixel group 402. Thus, there areDepending on which pixel was deleted, there is a possible set of equations to solve.The solution for each of these 8 possible sets of equations is preferably pre-stored in the device 100.

The result of the second fitting of the parametric function 404 is shown in fig. 4 c. As is evident from this figure, the parameter function 404 has an extreme point that is a minimum. Therefore, the maximum value check of step S08a fails, and the processor 102 will again proceed to step S08c to delete pixels from the neighboring pixel group 402 as described above. This time, pixel values z having values corresponding to reference numeral 403c in fig. 4c are deleted from the group₆The pixel of (2). After deletion, 7 pixels remain in the adjacent pixel group. Since the number of pixels remaining in the group is still larger than the number of parameters of the parameter function, the condition of step S08d is met. Therefore, the processor 102 proceeds to step S04 again, and fits the parameter function to the pixel values of the adjacent pixel group 402 for the third time by solving the equation system. At this stage, in the case where two pixels are deleted, there is a state whereThe possible set of equations to be solved depends on which pixels are deleted. Also, the solutions of these 28 equations are preferably pre-stored in the device 100. If the method were to delete another pixel, there would be a next fitting phaseThe possible sets of equations, their solutions are preferably pre-stored in the device 100. Thus, in this example, the device 100 preferably pre-stores solutions to 93 equation sets of 1+8+28+ 56.

The result of the third fitting of the parametric function 404 is shown in fig. 4 d. This time, the extreme point 406 is the maximum, i.e., it is the same type as the extreme point in the image 400. Furthermore, the spatial position 405d of the extreme point 406 of the parametric function 404 is located inside the region 407. Therefore, both the maximum value check of step S08a and the positioning check of step S08b pass. As a result, the processor 102 proceeds to step S10.

In step S10, the processor 102 estimates the sub-pixel position of the extreme point in the image 400 as the spatial position 405d of the parametric function 404 from the last iteration. Further, in step S12, the processor 102 associates the estimated sub-pixel locations with uncertainty levels corresponding to the number of iterations of steps S04, S06, and S08 a-b. 4a-d, the uncertainty level associated with the estimated sub-pixel position will be equal to 3, since 3 iterations are required before the estimated sub-pixel position is found.

The estimated sub-pixel position and the associated uncertainty level of the estimated sub-pixel position may be an output of the display device 100. However, in general, the estimated sub-pixel locations and associated uncertainty levels may be stored in a memory of the device for further processing. The processing of the estimated sub-pixel locations may vary depending on the application at hand. However, it is common for various applications that the processor 102 may use the uncertainty level associated with the estimate as a weight during processing. In particular, a higher uncertainty level may correspond to a lower weight than a lower uncertainty level.

A particular application of the above method is stereoscopic imaging. In particular, as explained above in connection with fig. 2, the apparatus 100 may be used in a stereo camera system 200 to estimate the sub-pixel positions of extreme points in a correlation map. The operation of the system 200 will now be explained in more detail with reference to fig. 2 and 5.

The first image sensor 202 captures a first image 203 and the second image sensor 204 captures a second image 205. The first image 203 and the second image 205 form a stereoscopic image pair. The pair of stereoscopic images 203, 205 is input to the matching section 206. The matching section 206 matches pixel values in the first image 203 with pixel values in the second image 205. The matching may be performed using any known local stereo matching algorithm, such as an algorithm using sum of squared differences, sum of absolute differences, or normalized cross-correlation. These algorithms have in common that they compare a portion of the first image 203 with a different portion of the second image 205 and determine how similar the portion of the first image 203 is to the different portion of the second image 205. The portion in the first image 203 may be a neighborhood of a pixel in the first image 203. The portion in the second image 205 may be a neighborhood of different pixels in the second image 205. The different pixels in the second image 205 may include all pixels in the second image 205 or a subset of pixels in the second image 205. Here, the neighborhood of a pixel refers to the pixel itself in the image and one or more pixels around the pixel. The matching component 206 may store the result of the matching in the correlation graph 207. Thus, the correlation map 207 includes a correlation value indicating how well each portion in the second image 205 matches a particular portion in the first image 203. In particular, the correlation map 207 may include correlation values for each pixel or subset of pixels in the second image 205. Each correlation value indicates how well the neighborhood of the pixel in the second image 205 matches the neighborhood of a particular pixel in the first image 203.

The matching component 206 may generate such a correlation map 207 for one or more pixels 502a, 502b, 502c, 502d in the first image 203. One or more pixels 502a, 502b, 502c, 502d in the first image 203 may correspond to a designated area in the first image 203. For example, one or more pixels 502a, 502b, 502c, 502d in the first image 203 may be a group of adjacent pixels in the first image 203. In some applications, one or more pixels 502a, 502b, 502c, 502d in the first image 203 may depict the same object in the scene. This is the case in the example of fig. 5. In fig. 5, a first image 203 and a second image 205 depict an object 500 in a scene, here in the form of a car. Here, one or more pixels 502a, 502b, 502c, 502d in the first image 203 each depict the car. The matching component 206 may generate the correlation graph 207 for each of the one or more pixels 502a, 502b, 502c, 502d that depict the object 500 as described above. Thus, each of the one or more pixels 502a, 502b, 502c, 502d is associated with a respective correlation map 207.

A correlation map 207 corresponding to each of one or more pixels 502a, 502b, 502c, 502d in the first image 203 may be provided as an input to the device 100. Thus, the image 400 described in connection with fig. 4a will in this case be the correlation map 207. The apparatus 100 processes each of the correlation graphs 207 according to the method described with respect to fig. 3. Thus, the apparatus 100 estimates the sub-pixel position of the maximum value in each correlation map 207 and the level of uncertainty of the estimated sub-pixel position. In other words, the device 100 estimates, for each of one or more pixels in the first image 203, the sub-pixel position in the second image 205 that gives the best match to that pixel. Turning to the example of fig. 5, a sub-pixel location 504a, 504b, 504c, 504d is estimated for each of one or more pixels 502a, 502b, 502c, 502 d. In this example, the associated uncertainty levels for sub-pixel locations 504a, 504b, 504c, 504d are assumed to be 1, 3, 2, 1, respectively.

The processor 102 of the device 100 may then proceed to process the estimated sub-pixel locations 504a, 504b, 504c, 504d corresponding to one or more pixels 502a, 502b, 502c, 502 d. During processing, the processor 102 may use the uncertainty levels associated with the sub-pixel locations 504a, 504b, 504c, 504d to weight the sub-pixel locations or any quantities calculated from the sub-pixel locations. The processor 102 will weight more certain estimates more heavily than less certain estimates. In the example of fig. 5, sub-pixel locations 504a and 504d (uncertainty level 1) or any parameter derived therefrom would be given a higher weight than sub-pixel location 504c (uncertainty level 2) or any parameter derived therefrom. Sub-pixel position 504c, or any parameter derived therefrom, will then be given a higher weight than sub-pixel position 504d (uncertainty level 3), or any parameter derived therefrom.

According to an example, the quantity derived from the estimated sub-pixel positions 504a, 504b, 504c, 504d is a disparity value. More specifically, the processor 102 may calculate a disparity value for each of the one or more pixels 502a, 502b, 502c, 502 d. The disparity value is calculated as the difference between the position of one of the pixels 502a, 502b, 504c, 504d in the first image 203 and the corresponding sub-pixel position 504a, 504b, 504c, 504d in the second image 205. Each disparity value may be associated with a level of uncertainty in the sub-pixel location used in calculating the disparity value. The processor 102 may then smooth the calculated disparity value. This may include calculating a weighted average of the disparity values. When calculating the weighted average, disparity values associated with higher uncertainty levels are given lower weight than disparity values associated with lower uncertainty levels.

According to an example, the quantity derived from the estimated sub-pixel positions 504a, 504b, 504c, 504d is a depth value. In more detail, the processor 102 may calculate a depth value for each of the one or more pixels 502a, 502b, 502c, 502 d. As is known in the art, when the distance between image sensors and the focal distance are known, the depth value may be calculated from the point correspondence between two stereoscopic images. In more detail, the depth may be calculated as the product of the focal length and the distance between the sensors divided by the parallax. Processor 102 may associate each depth value with a level of uncertainty of the sub-pixel location used in calculating the depth value. The processor 120 may smooth the calculated depth values. This may include calculating a weighted average of the depth values. When calculating the weighted average, depth values associated with higher uncertainty levels are given lower weight than depth values associated with lower uncertainty levels.

When one or more pixels 502a, 502b, 502c, 502d correspond to all pixels of the first image 203, the calculated depth values may be considered for forming a depth map. A spatial filter may be used to smooth the depth map. The coefficients of the spatial filter correspond to the weights described above and may be set using the uncertainty level associated with the depth values.

When one or more pixels 502a, 502b, 502c, 502d correspond to a designated area of the first image 203, a weighted area of depth for the designated area may be calculated.

According to an example, the quantities derived from the estimated sub-pixel positions 504a, 504b, 504c, 504d are points in three-dimensional space. More specifically, the processor 102 may use one or more pixels 502a, 502b, 502c, 502d in the first image 203 and corresponding sub-pixel locations 504a, 504b, 504c, 504d in the second image 205 to calculate points 506a, 506b, 506c, 506d in three-dimensional space corresponding to the one or more pixels 502a, 502b, 502c, 502 d. As is known in the art, the coordinates of a point in three-dimensional space corresponding to a pixel in the first image 203 may be calculated from the position and depth value of the pixel. The processor 102 may further associate each of the calculated points 506a, 506b, 506c, 506d with a level of uncertainty of the sub-pixel position used in calculating the coordinates of that point. If one or more pixels 502a, 502b, 502c, 502d depict the same object 500, the calculated points 506a, 506b, 506c, 506d will be estimates of the points on the object 500 in the scene.

The processor 102 may process the points 506a, 506b, 506c, 506 d. During processing, the processor 102 may weight a point 506a, 506b, 506c, 506d using its associated uncertainty level. Points with higher uncertainty levels will be given lower weight than points with lower uncertainty levels. As shown in FIG. 5, the processing of the points 506a, 506b, 506c, 506d may include fitting an object template 508 to the points 506a, 506b, 506c, 506 d. The object template 508 defines the outline of the same type of object as the real object 500 in the scene. In this case, the object template 508 defines the outline of the car. However, in other applications, the object template 508 may be a geometric plane, a person, or any other kind of object. When fitting the object template 508, the processor 102 may give different weights to the points 506a, 506b, 506c, 506 d. The weights may be set depending on the uncertainty levels associated with the points 506a, 506b, 506c, 506d, such that a higher uncertainty level results in a lower weight, and vice versa. For example, the object template 508 may be fitted to the points 506a, 506b, 506c, 506d using a weighted least squares method, wherein a weighted sum of squared distances between the points 506a, 506b, 506c, 506d and the object template 508 is minimized. Terms in the sum may be weighted such that terms corresponding to points with higher uncertainty levels are given lower weights than terms corresponding to points with lower uncertainty levels.

It will be appreciated that a person skilled in the art may modify the above-described embodiments in a number of ways, while still making use of the advantages of the invention as shown in the above-described embodiments. Accordingly, the present invention should not be limited to the illustrated embodiments, but should be defined only by the following claims. Additionally, the illustrated embodiments may be combined as understood by those skilled in the art.

25页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种端到端光学字符检测识别方法与系统

Method, device and system for estimating sub-pixel positions of extreme points in an image

相关技术

网友询问留言