Pose optimization method and device

文档序号：934021 发布日期：2021-03-05 浏览：6次中文

阅读说明：本技术 一种位姿优化方法及装置 (Pose optimization method and device ) 是由曾丝雨于 2019-08-31 设计创作，主要内容包括：公开了一种位姿优化方法、装置、计算机可读存储介质及电子设备,该方法包括：获取图像采集设备在采集当前帧图像时对应的第一位姿；确定所述第一位姿对应的可视空间区域；确定矢量地图中落入所述可视空间区域的地图点的空间坐标；确定所述当前帧图像的特征信息；根据所述第一位姿、所述地图点的空间坐标和所述特征信息,确定所述地图点在所述当前帧图像中的投影误差；根据所述投影误差对所述第一位姿进行优化,以获得经过优化的第一位姿。本公开的技术方案无需恢复图像中物体的空间信息,从而节约电子设备的存储空间,提高位姿优化效率。(A pose optimization method, a pose optimization device, a computer readable storage medium and an electronic device are disclosed, wherein the method comprises the following steps: acquiring a first pose corresponding to the image acquisition equipment when acquiring a current frame image; determining a visual space region corresponding to the first posture; determining the space coordinates of map points falling into the visible space area in the vector map; determining characteristic information of the current frame image; determining the projection error of the map point in the current frame image according to the first pose, the space coordinate of the map point and the characteristic information; and optimizing the first pose according to the projection error to obtain an optimized first pose. According to the technical scheme, the spatial information of the object in the image does not need to be restored, so that the storage space of the electronic equipment is saved, and the pose optimization efficiency is improved.)

1. A pose optimization method, comprising:

acquiring a first pose corresponding to the image acquisition equipment when acquiring a current frame image;

determining a visual space region corresponding to the first posture;

determining the space coordinates of map points falling into the visible space area in the vector map;

determining characteristic information of the current frame image;

determining the projection error of the map point in the current frame image according to the first pose, the space coordinate of the map point and the characteristic information;

and optimizing the first pose according to the projection error to obtain an optimized first pose.

2. The method of claim 1, wherein the feature information includes contour information of the first target object;

the determining the projection error of the map point in the current frame image according to the first pose, the spatial coordinates of the map point and the feature information includes:

projecting the space coordinates of the map points to the current frame image through the first pose to determine the projection pixel coordinates of the map points;

and determining a projection error according to the projection pixel coordinate and the contour information.

3. The method of claim 2, wherein the feature information further comprises pixel points of a second target object;

the projecting the spatial coordinates of the map point into the current frame image through the first pose to determine the projected pixel coordinates of the map point comprises:

projecting the space coordinates of the map points to the current frame image through the first pose to obtain projection points;

determining a mask area according to the pixel points of the second target object;

and determining projection pixel coordinates according to the projection points and the mask area.

4. The method of claim 1, wherein the optimizing the first pose as a function of the projection error to obtain an optimized first pose comprises:

determining a weight value of the map point according to a distance between the map point and the image acquisition equipment;

and optimizing the first pose according to the weighted value of the map point and the projection error of the map point to obtain an optimized first pose.

5. The method of claim 1, wherein the method further comprises:

performing feature matching on a first feature point of the current frame image and a second feature point of a previous frame image to determine a matching feature point, wherein the matching feature point corresponds to the same semantic category information;

determining a basic matrix according to the first pose, a second pose of the previous frame image and preset camera internal parameters;

determining epipolar constraint errors according to the basic matrix and the matched feature points;

the optimizing the first pose according to the projection error to obtain an optimized first pose comprises:

and optimizing the first pose according to the projection error and the epipolar constraint error to obtain an optimized first pose.

6. The method of claim 5, wherein the method further comprises:

determining, by an absolute scale sensor, a first translation increment by which the image acquisition device moved during acquisition of the current frame image and the previous frame image;

determining a second translation increment between the first pose and the second pose;

the optimizing the first pose according to the projection error and the epipolar constraint error to obtain an optimized first pose comprises:

and optimizing the first pose according to the projection error, the epipolar constraint error and the difference value of the first translation increment and the second translation increment to obtain an optimized first pose.

7. The method of claim 6, wherein the optimizing the first pose as a function of the projection error, the epipolar constraint error, and the difference between the first translation increment and the second translation increment to obtain an optimized first pose comprises:

determining an objective function based on the projection error, the epipolar constraint error, and a difference of the first and second translation increments;

adjusting the first pose to adjust the objective function;

and determining the first pose when the target function meets the preset condition as the optimized first pose.

8. A pose optimization apparatus, comprising:

the acquisition module is used for acquiring a corresponding first pose of the image acquisition equipment when acquiring the current frame image;

the area determining module is used for determining a visible space area corresponding to the first posture;

the map point determining module is used for determining the space coordinates of map points falling into the visible space area in the vector map;

the information determining module is used for determining the characteristic information of the current frame image;

the error determination module is used for determining the projection error of the map point in the current frame image according to the first pose, the space coordinate of the map point and the characteristic information;

and the optimization module is used for optimizing the first pose according to the projection error so as to obtain an optimized first pose.

9. A computer-readable storage medium storing a computer program for executing the pose optimization method according to any one of claims 1 to 7.

10. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is used for reading the executable instructions from the memory and executing the instructions to realize the pose optimization method of any one of the claims 1 to 7.

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a pose optimization method and apparatus.

Background

Pose optimization is usually required to improve positioning accuracy.

In the existing pose optimization method, the spatial information of an object in an image is restored mainly based on the pose, and the pose is optimized according to the spatial information of the object in the image.

However, the data storage amount of the spatial information of the object in the restored image is large, resulting in lowering the pose optimization efficiency.

Disclosure of Invention

The present application is proposed to solve the above-mentioned technical problems. The embodiment of the application provides a pose optimization method and device, a computer-readable storage medium and electronic equipment.

According to an aspect of the present application, there is provided a pose optimization method including:

acquiring a first pose corresponding to the image acquisition equipment when acquiring a current frame image;

determining a visual space region corresponding to the first posture;

determining the space coordinates of map points falling into the visible space area in the vector map;

determining characteristic information of the current frame image;

determining the projection error of the map point in the current frame image according to the first pose, the space coordinate of the map point and the characteristic information;

and optimizing the first pose according to the projection error to obtain an optimized first pose.

According to an aspect of the present application, there is provided a pose optimization apparatus including:

the acquisition module is used for acquiring a corresponding first pose of the image acquisition equipment when acquiring the current frame image;

the area determining module is used for determining a visible space area corresponding to the first posture;

the map point determining module is used for determining the space coordinates of map points falling into the visible space area in the vector map;

the information determining module is used for determining the characteristic information of the current frame image;

and the optimization module is used for optimizing the first pose according to the projection error so as to obtain an optimized first pose.

According to a third aspect of the present application, there is provided a computer-readable storage medium storing a computer program for executing the pose optimization method described above.

According to a fourth aspect of the present application, there is provided an electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

and the processor is used for reading the executable instruction from the memory and executing the instruction to realize the pose optimization method.

Compared with the prior art, the pose optimization method, the pose optimization device, the computer readable storage medium and the electronic equipment at least have the following beneficial effects:

on one hand, the small vector map file is fully considered, and meanwhile, the space information of the vector object in the vector map can be repeatedly utilized, so that the storage space of the electronic equipment is saved, and the pose optimization efficiency is improved.

On the other hand, the embodiment determines the space coordinate of the map point falling into the visible space area in the vector map by determining the visible space area of the pose, the space coordinate of the map point indicates the space information of the object in the image corresponding to the pose, then the map point is projected into the image based on the pose and the characteristic information of the image, namely the projection error of the map point in the image can be determined, the pose is optimized through the projection error, the optimized pose is determined, the space information of the object in the image does not need to be recovered, the storage space of the electronic equipment is saved, and the pose optimization efficiency is improved.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 is a schematic flowchart of a pose optimization method according to an exemplary embodiment of the present application.

Fig. 2 is a schematic diagram of a visible space region in a pose optimization method according to an exemplary embodiment of the present application.

Fig. 3 is a flowchart illustrating step 105 of the pose optimization method according to an exemplary embodiment of the present application.

Fig. 4 is a schematic flowchart of step 1051 in the pose optimization method according to an exemplary embodiment of the present application.

Fig. 5 is a flowchart illustrating step 106 of the pose optimization method according to an exemplary embodiment of the present application.

Fig. 6 is a schematic flowchart of a pose optimization method according to another exemplary embodiment of the present application.

Fig. 7 is a flowchart illustrating a pose optimization method according to still another exemplary embodiment of the present application.

Fig. 8 is a flowchart illustrating step 703 in a pose optimization method according to yet another exemplary embodiment of the present application.

Fig. 9 is a schematic structural diagram of a pose optimization apparatus according to an exemplary embodiment of the present application.

Fig. 10 is a schematic structural diagram of a pose optimization apparatus according to another exemplary embodiment of the present application.

Fig. 11 is a schematic structural diagram of a projection unit 9051 in the pose optimization apparatus according to another exemplary embodiment of the present application.

Fig. 12 is a schematic structural diagram of a pose optimization apparatus according to still another exemplary embodiment of the present application.

Fig. 13 is a schematic structural diagram of a pose optimization apparatus according to still another exemplary embodiment of the present application.

Fig. 14 is a schematic structural diagram of a third optimization module 1303 in the pose optimization apparatus according to still another exemplary embodiment of the present application.

Fig. 15 is a block diagram of an electronic device provided in an exemplary embodiment of the present application.

Detailed Description

Hereinafter, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.

Summary of the application

In order to track and navigate a movable device in order to analyze the motion behavior of the movable device while providing a driving strategy, it is often necessary to determine the pose of the movable device. At present, a sensor (for example, a camera, an inertial measurement unit, etc.) is usually installed on a mobile device to acquire motion data and images of the mobile device during driving, a camera calibration is performed according to the images (the camera calibration refers to restoring an object in a space by using images captured by the camera), and a pose of the mobile device is determined according to a result of the camera calibration and the motion data. In consideration of the fact that the precision of camera calibration and motion data is not enough, specifically, errors of the camera calibration are reflected in space reconstruction, the motion data obtain relative poses, errors of a certain frame in the front are transmitted to the back all the time, and finally the final pose errors are possibly very large, so that the integral precision of the system is limited. Therefore, it is often necessary to optimize the pose. In the existing pose optimization method, it is usually necessary to restore the spatial information of an object in an image captured by a camera, and then optimize the pose based on the restored spatial information of the object in the image.

However, the data volume of the spatial information of the object in the image recovered in the camera calibration process is huge, which may occupy too much storage space of the electronic device, and reduce the pose optimization efficiency.

The method fully considers the defects, the position and the attitude of the current frame image are obtained, the visible space area corresponding to the position and the attitude is determined, then the map point of the vector map in the visible space area is determined, the projection error of the map point projected to the current frame image is determined based on the position and the characteristic information of the current frame image, then the position and the attitude are optimized by the projection error, and the optimized position and the attitude are further obtained.

Exemplary method

Fig. 1 is a schematic flowchart of a pose optimization method according to an exemplary embodiment of the present application.

The embodiment can be applied to electronic equipment, and particularly can be applied to intelligent equipment, a server or a general computer, wherein the intelligent equipment comprises but is not limited to an automatic driving automobile, a unmanned aerial vehicle and an intelligent robot.

As shown in fig. 1, a pose optimization method provided in an exemplary embodiment of the present application at least includes the following steps:

step 101, acquiring a first pose corresponding to the image acquisition device when acquiring the current frame image.

The image acquisition equipment is fixedly arranged on the movable equipment, the pose of the image acquisition equipment is continuously changed along with the movement of the movable equipment, and the front road surface of the movable equipment in the driving process is continuously photographed based on the pose, so that images are continuously acquired, and in sum, the images correspond to the pose of the image acquisition equipment, so that the current frame image corresponds to the pose of one image acquisition equipment, and the pose of the image acquisition equipment can indicate the pose of the movable equipment.

It should be noted that the mobile device includes a plurality of sensors in addition to the image capturing device, data captured by the sensors can reflect motion information and position information of the mobile device, and an initial pose of the image capturing device can be roughly estimated according to the data captured by the sensors, specifically, when a current frame image is a first frame image, data captured by the plurality of sensors is obtained and fused, so as to determine a first pose composed of a rotation matrix and a translation matrix, and the first pose is usually rough and cannot accurately reflect an actual pose of the image capturing device, so that pose optimization is performed with the first pose corresponding to the first frame image as the initial pose. The translation matrix indicates a distance between an origin of the world coordinate system and an origin of the camera coordinate system, and the rotation matrix indicates a sum of effects of the world coordinate system rotating to the camera coordinate system around the horizontal axis, the vertical axis, and the vertical axis, respectively, i.e., a product of a rotation matrix of the world coordinate system around the horizontal axis, a rotation matrix of the vertical axis, and a rotation matrix of the vertical axis, the rotation matrix indicating a rotational relationship of the world coordinate system with the camera coordinate system.

It should be noted that time points of data acquisition by different sensors are often different, and time synchronization is usually required to be performed on data acquired by each sensor, so that the time point of data acquisition by the sensor is the same as the time point of image acquisition by the image acquisition device, thereby ensuring the reference value of data acquired by the sensor.

In particular, the movable device refers to an object capable of moving, such as an autonomous vehicle, a drone, and a smart robot. The current frame image refers to an image which is acquired latest by the image acquisition device at the current moment, namely the image which is acquired latest by the image acquisition device at the time point. The image capturing apparatus refers to an apparatus having a photographing function, such as a monocular camera, a binocular camera. The pose of the image acquisition equipment refers to the position and the pose of the image acquisition equipment in a world coordinate system, the position is mainly embodied by a translation matrix in the world coordinate system, and the pose is mainly embodied by a rotation matrix in the world coordinate system.

Image pre-processing of the acquired image is typically required, where image pre-processing includes, but is not limited to, de-distortion and down-sampling, taking into account distortion during the imaging process, the size of the display area, and the memory space of the electronic device. Obviously, the current frame image is an image after image preprocessing.

And 102, determining a visual space region corresponding to the first posture.

The image acquisition equipment can acquire information within a specific shooting range, the specific shooting range indicates the visual distance of the image acquisition equipment, the visual distance refers to the farthest distance corresponding to each position in a space range in which the image acquisition equipment can continuously shoot when the movable equipment runs normally, the positions comprise an upper position, a lower position, a left position, a right position, a front position and a rear position, and the visual distance is usually fixed and unchangeable, for example, the visual distance of the image acquisition equipment comprises a forward looking distance of 80 meters, a left visual distance of 20 meters, a right visual distance of 20 meters, an upper visual distance of 15 meters and a lower visual distance of 5 meters. The visible space region takes the position of the image acquisition device as a reference point, and therefore, the visible space region can be determined based on the first pose and the visual distance corresponding to the image acquisition device, the visible space region refers to a region formed by a space coordinate set corresponding to the visual distance of the image acquisition device in a world coordinate system, and specifically, the space coordinate set includes, but is not limited to, coordinate values corresponding to two coordinate points on a horizontal axis corresponding to a forward-looking distance and a backward-looking distance, coordinate values corresponding to two coordinate points on a vertical axis corresponding to a left-looking distance and a right-looking distance, coordinate values corresponding to two coordinate points on the vertical axis corresponding to an upward-looking distance and a downward-looking distance, and the space coordinate values of the image acquisition device. Referring to fig. 2, the visible space region is a rectangular pyramid formed by space coordinates corresponding to 4 vertices on a plane formed by the left viewing distance, the right viewing distance, the top viewing distance, and the bottom viewing distance, and space coordinates of the image capturing device, and the plane and the image capturing device have a fixed relative position relationship, so that the space coordinates corresponding to the 4 vertices on the plane can be determined based on the front viewing distance and the first pose, and the front viewing distance is a perpendicular distance between the image capturing device and the plane.

And 103, determining the space coordinates of map points falling into the visible space area in the vector map.

The vector object is formed by elements such as points, lines, rectangles, polygons, circles and arcs, the elements are obtained through mathematical formula calculation, the vector object can be obtained through formula calculation, so that the file volume of the vector object is generally small, the vector map is formed by the vector object, the file volume of the vector map is generally small, the storage space of the electronic device can be saved, and the data processing efficiency of the electronic device is improved. The vector object has the greatest advantage that the vector map is undistorted no matter whether the vector object is enlarged, reduced or rotated, and the vector map refers to a map which can be arbitrarily enlarged, reduced or rotated and the like, and the scale of the vector map can be adjusted according to needs to change the detailed degree of the display content.

Considering the complexity difference of the outlines of the vector objects in the vector map, in order to simplify the calculation and quickly retrieve the vector objects in the vector map, generally, the vector objects are not selected to be retrieved, but a minimum external bounding box is used as a retrieval object, after the shape of the minimum external bounding box is determined, the minimum external bounding box indicates the minimum area boundary of the vector objects, the area range of the vector objects can be more accurately reflected, the retrieval of the vector objects is realized by constructing a spatial index of the minimum external bounding box, so that the minimum external bounding box falling into the visible space area can be quickly, simply and accurately retrieved, and then, the vector objects in the retrieved minimum external bounding box need to be extracted, and map points are determined based on the extracted vector objects. In a possible implementation manner, by determining a plurality of spatial coordinates of a minimum external bounding box of a vector object and determining object information of the vector object in an area formed by the plurality of spatial coordinates of the minimum external bounding box, wherein the object information includes object categories, object geometric information and spatial coordinates of object key nodes, the object information is stored as an object element, so that the vector map is resolved, the resolution of the vector map refers to the splitting of each vector object in the vector map, obviously, the corresponding vector object can be generated by using the object information in the object element, that is, the original vector object is recovered. For example, taking the example that the vector object is a signboard, the object category is the signboard, the geometric information of the object is a cuboid, and the spatial coordinates of the key nodes of the object include the spatial coordinates of eight vertexes constituting the signboard. And then, searching a plurality of space coordinates of each minimum external bounding box in the vector map by using the spatial data index Rtree, and determining the minimum external bounding box in the visible space region.

Considering that the vector object is calculated by a formula, the number of key nodes depicting the vector object in the vector map is usually relatively small (for example, a pillar gives a vertex at two ends, and a road arrow gives an arrow vertex), that is, the number of the key nodes depicting the vector object in the sub-graph is relatively small, in order to increase the number of map points and improve the accuracy of pose optimization, the vector object in the sub-graph needs to be up-sampled to obtain more sampling points and determine the spatial coordinates of the sampling points, where the up-sampling mode includes, but is not limited to, uniform sampling, or multi-sampling close to the key nodes, and low-sampling far from the key nodes, and it needs to be explained that the sampling points are usually located on the edge contour of the vector object. Considering that when the minimum external bounding box is retrieved, when a plurality of space coordinate parts of the minimum external bounding box are located in the visible space region, the minimum external bounding box is also reserved, and the vector object corresponding to the minimum external bounding box is added to the sub-image, so that part of the vector objects in the sub-image do not completely fall into the visible space region, that is, all sampling points are not located in the visible space region. The vector map is built based on the world coordinate system, and therefore, the spatial coordinates of map points are located in the world coordinate system.

It should be noted that, in the three-dimensional reconstruction process, spatial information of an object in an image may be continuously recovered, and the spatial information may occupy a large amount of storage space of the electronic device, and at the same time, data processing efficiency of the electronic device may be reduced. However, for the vector map, only map points located in the visible space region need to be continuously selected from the vector map, and the storage space occupied by the vector map file is small, so that the storage space of the electronic device can be saved, and the data processing efficiency of the electronic device can be improved.

And 104, determining the characteristic information of the current frame image.

Performing semantic segmentation on the current frame image, and performing feature extraction according to a semantic segmentation result, thereby determining feature information for describing objects in the image, wherein the feature information includes but is not limited to any one or more of color feature information, texture feature information, shape feature information, and spatial relationship feature information, the color feature information and the texture feature information are both used for describing surface properties of the objects in the image, the shape feature information is used for describing an overall shape or a contour shape of the objects in the image, and the spatial relationship feature information is used for describing spatial positions or relative directional relationships among a plurality of objects in the image. In a particular scene, feature information needs to be determined in conjunction with the features of objects in the image.

And 105, determining the projection error of the map point in the current frame image according to the first pose, the space coordinate of the map point and the characteristic information.

Specifically, the current frame image corresponds to an image coordinate system, the image acquisition device corresponds to a camera coordinate system, the space coordinates of the map points correspond to a world coordinate system, the world coordinate system reaches the camera coordinate system through rigid body transformation, and then the camera coordinate system reaches the image coordinate system through perspective projection transformation. It can be seen that the relation between world coordinates and image coordinates is established on the basis of rigid body transformation and perspective projection transformation. Rigid body transformation refers to the movement of rotating and translating an image acquisition device in a three-dimensional space when the image acquisition device is not deformed. Perspective projection refers to a single-sided projection image which is closer to the visual effect and is obtained by projecting a shape onto a projection surface by using a central projection method, wherein the central projection method refers to the projection line which is converged at the projection center.

Specifically, the external parameters of the image capturing device, which are used to describe the relationship between the camera coordinate system and the world coordinate system, may be a homogeneous matrix composed of a rotation matrix and a translation matrix to determine the transformation relationship between the world coordinate system and the camera coordinate system, and then, the transformation relationship between the camera coordinate system and the image coordinate system may be determined by the preset camera internal parameters in the image capturing device, which are only related to the internal structure of the camera in the image capturing device, but not related to the camera position parameters, and mainly include the coordinates of the image principal point (the image principal point refers to the perpendicular line between the photographing center and the image plane, the intersection point with the image plane), the height and width of a single pixel, the effective focal length of the camera, the distortion coefficient of the lens, and the like, and the preset camera internal parameters are usually unchanged.

In summary, the transformation relationship between the world coordinate system and the image coordinate system can be determined based on the first pose and the preset camera internal parameters, and then, projecting the space coordinates of the map points to the current frame image according to the conversion relation to obtain the image coordinates of the map points in the current frame image, determining the projection error of the map point in the current frame image according to the feature information and the image coordinate of the map point projected in the current frame image, considering that the map point is usually located on the edge contour of the vector object, the projection error thus refers to the distance between the image coordinates of the map points projected in the current frame image and the image coordinates of the edge profile of the observed object, which should be the object corresponding to the object class to which the map points belong, on the premise of ensuring that the error is within the acceptable range, the observed object can also be the object closest to the image coordinate of the map point projected in the current frame image. It should be noted that the edge profile of the observed object has several image coordinates, and the distance between the image coordinates of the edge profile of the observed object and the image coordinates of the projection of the map point into the current frame image is the closest. The image coordinates refer to coordinates in an image coordinate system. Image coordinates of an edge profile of the observation object are determined based on the feature information.

And 106, optimizing the first pose according to the projection error to obtain an optimized first pose.

The projection error is determined based on the first pose, a projection error optimization function can be constructed through the projection error, the projection error optimization function indicates the sum of the projection errors of all map points, when the first pose changes, the projection error also changes, correspondingly, the projection error optimization function also changes, and the projection error optimization function is minimized, so that the first pose is optimized. The projection error indicates a difference between the first pose and the actual pose, so that the accuracy of the optimized first pose can be ensured.

It should be noted that, based on the time series, the curves between the optimized poses are usually incomplete, and therefore, the optimized poses and the data collected by the sensor and capable of reflecting the motion information and the position information of the mobile device need to be fused, so as to determine the first poses, so that the curves between the first poses are relatively smooth, and the motion curve of the image collection device is more accurately reflected. Obviously, the optimized pose can be directly determined as the first pose without data fusion.

In the embodiment, by determining the projection error of the map point falling into the visual range in the current frame image in the vector map, the projection error can reflect the difference between the pose and the actual pose, the pose is optimized based on the projection error, the optimized pose with relatively high accuracy can be obtained, and meanwhile, the vector map file has small volume and the space information of the object in the image does not need to be restored, so that other errors are avoided, the storage space of the electronic equipment is saved, the data processing efficiency of the electronic equipment is improved, and the pose optimization efficiency is further improved.

Fig. 3 is a schematic flowchart illustrating a step of determining a projection error of the map point in the current frame image according to the first pose, the spatial coordinates of the map point, and the feature information in the embodiment shown in fig. 1.

As shown in fig. 3, based on the embodiment shown in fig. 1, where the feature information includes contour information of the first target object, in an exemplary embodiment of the present application, the step 105 of determining a projection error of the map point in the current frame image according to the first pose, the spatial coordinates of the map point, and the feature information may specifically include the following steps:

step 1051, projecting the space coordinate of the map point to the current frame image through the first pose to determine the projection pixel coordinate of the map point.

The image coordinate system comprises an image physical coordinate system and an image pixel coordinate system, the camera coordinate system reaches the image physical coordinate system through perspective projection transformation, the image physical coordinate system is transformed to the image pixel coordinate system through discretization, and the discretization refers to correspondingly reducing the data under the condition that the relative size of the data is not changed. The image pixel coordinate system and the image physical coordinate system are both on the imaging plane, except for the respective origin and measurement units. The origin of the physical coordinate system of the image is usually the principal point of the image, in millimeters, belonging to physical units. The image pixel coordinate system takes the vertex of the image as the coordinate origin, and the unit is the pixel. Considering that an image captured by a camera in an image capturing device is a digital image, the digital image includes a plurality of pixel points, and therefore, a position of each pixel point in the image is generally described based on an image pixel coordinate system, and the position is a pixel coordinate.

Specifically, the first pose indicates a conversion relation between a world coordinate system and a camera coordinate system, the preset camera internal reference indicates a conversion relation between the camera coordinate system and an image pixel coordinate system, the space coordinates of the map point are projected into the current frame image based on the first pose and the preset camera internal reference, the pixel point of the map point projected into the current frame image is determined, and the pixel coordinate of the pixel point is the projected pixel coordinate of the map point.

Step 1052, determining a projection error according to the projection pixel coordinate and the contour information.

The first target object is a number of objects of interest, such as traffic sign elements (e.g., posts, signs, lane markings, pavement markings, etc.). The contour information includes, but is not limited to, edge pixel points of the object of interest and pixel coordinates of the edge pixel points. As previously known, the projection error is determined based on the minimum distance between the map point and the edge profile of the observed object, where the observed object is the first target object, and thus the edge profile of the observed object is determined from the profile information.

It should be noted that, semantic segmentation is performed on the current frame image, a semantic category of each pixel point in the current frame image can be obtained, a pixel point corresponding to the first target object is extracted according to the semantic category, so as to form a first target object image, and contour information can be obtained by extracting a first target object edge contour from the first target object image, where the contour information includes an edge contour image corresponding to the first target object.

In a first possible implementation manner, whether the object type to which the edge pixel point belongs is the same as the object type to which the map point belongs is not considered, the pixel coordinate of the edge pixel point closest to the projection pixel coordinate in the edge contour map is directly determined, and the distance value between the pixel coordinate and the projection pixel coordinate is determined as a projection error.

In a second possible implementation manner, the pixel coordinates of the edge pixel points closest to the projection pixel coordinates in the edge contour map are determined, the object type to which the edge pixel points belong is the same as the object type to which the map points belong, and the distance value between the pixel coordinates and the projection pixel coordinates is determined as the projection error.

In a third possible implementation manner, a residual map corresponding to the edge contour map is determined, each pixel point in the residual map corresponds to a chamfering distance value, the chamfering distance value is a distance value between the pixel point and an edge pixel point closest to the pixel point, and here, regardless of whether the object type to which the edge pixel point belongs is the same as the object type to which the map point belongs, the size and the position of the residual map and the current frame image are corresponding, a pixel point of a projection pixel coordinate in the residual map is determined, and the chamfering distance value of the pixel point is determined as a projection error, which is the same as the projection error determined by the first implementation manner, but the calculation amount is relatively large.

In a fourth possible implementation manner, a plurality of residual error maps corresponding to the edge contour map are determined, each residual error map includes a plurality of first target objects, the types of the objects to which the first target objects belong are the same, each pixel point in each residual error map corresponds to a chamfering distance value, and the chamfering distance value is a distance value between the pixel point and an edge pixel point closest to the pixel point. The number of the residual error graphs is the same as the number of the types of the objects to which the first target objects belong, and the types of the objects to which the first target objects belong in different residual error graphs are different. Specifically, a residual map with the same object type as the map point is determined, a pixel point of the projection pixel coordinate in the residual map is determined, and the chamfer distance value of the pixel point is determined as the projection error. The projection error is the same as the projection error determined in the second implementation, but the calculation amount is relatively large, and the occupied memory resource is relatively large. Different implementations may be selected in connection with different scenarios.

It should be noted that the projection error should be within a reasonable range, and the projection error that is not within the reasonable range should be directly filtered out, so that the projection error can more accurately reflect the difference between the first pose and the actual pose, thereby improving the accuracy of the determined optimized pose.

In the embodiment, the projection pixel coordinate of the map point in the current frame image is determined, and then the projection error which can reflect the difference between the first pose and the actual pose error more accurately is determined according to the projection pixel coordinate and the contour information, so that the accuracy of the determined optimized pose is improved

Fig. 4 is a schematic flowchart illustrating a step of projecting the spatial coordinates of the map point into the current frame image by the first pose to determine the projected pixel coordinates of the map point in the embodiment shown in fig. 3.

As shown in fig. 4, on the basis of the embodiment shown in fig. 3, the feature information further includes a pixel point of a second target object, and in an exemplary embodiment of the present application, step 1051 is to project the spatial coordinates of the map point to the current frame image through the first pose, so as to determine the projected pixel coordinates of the map point, which may specifically include the following steps:

and 10511, projecting the spatial coordinates of the map point to the current frame image through the first pose to obtain a projection point.

Based on the first pose and preset camera internal parameters, projecting the space coordinates of the map point into the current frame image, and determining a pixel point of the map point projected into the current frame image, wherein the pixel point is a projection point.

Step 10512, determining a mask region according to the pixel points of the second target object.

Due to the fact that the vector map does not have the shielding problem, when the movable equipment in the actual scene runs, the first target object in the visible space region is shielded frequently, and the first target object in the image acquired by the image acquisition equipment and the vector object in the vector map cannot be matched well frequently. If the map point is projected to the shielded area, an error projection error is often obtained, so that the accuracy of the determined optimized pose is influenced. The second target object is an uninteresting object, i.e., an occluding object that occludes the first target object, including but not limited to a vehicle, a pedestrian. Therefore, by masking the pixel points of the second target object, the masked pixel points are shielded by the selected image, graph or object, and then the masked area can be determined, so that the second target object is shielded from participating in the calculation of the projection error, and the accuracy of the determined optimized pose is improved. Obviously, the region outside the current frame image is also a mask region.

It should be noted that, semantic segmentation is performed on the current frame image, so that the semantic category of each pixel point in the current frame image can be obtained, and then, the pixel point corresponding to the semantic category of the second target object is extracted, so that the pixel point of the second target object can be determined.

Step 10513, determining the projection pixel coordinates according to the projection point and the mask area.

Specifically, the mask area is provided with a mark corresponding to an area in the current frame image, and when the projection point is projected to the outside of the area provided with the mark in the current frame image, the pixel coordinate of the projection point is determined as the projection pixel coordinate. When the projection point is projected into the area provided with the mark in the current frame image, the preset pixel coordinate is determined as the projection pixel coordinate, the preset pixel coordinate is an invalid pixel coordinate, for example, an infinite pixel coordinate can be preset, at this time, the calculated projection error is very large, and the projection error is directly filtered.

In this embodiment, by performing mask processing on a second target object that blocks the first target object in the current frame image, the projection error of the map point projected onto the second target object does not participate in the pose optimization process, thereby ensuring the accuracy of the determined optimized pose.

Fig. 5 shows a flow chart of the step of optimizing the first pose based on the projection error to obtain an optimized first pose as in the embodiment shown in fig. 1.

As shown in fig. 5, based on the embodiment shown in fig. 1, in an exemplary embodiment of the present application, the step 106 of optimizing the first pose according to the projection error to obtain an optimized first pose specifically includes the following steps:

step 1061, determining a weight value of the map point according to a distance between the map point and the image acquisition device.

Considering that the map points may be projected onto the image in uneven distribution, most of the map points far away from the image acquisition device tend to be concentrated near the main point of the image. However, usually, map points closer to the image capturing device can reflect the difference between the first pose and the actual pose, so that a weighted value needs to be added to the projection errors corresponding to different map points to correct the projection errors. The size of the weight value is negatively related to the distance between the map point and the image acquisition equipment, or the distance between the map point and the image acquisition equipment is negatively related to the ratio of the farthest visible distance in the visible space area, wherein the farthest visible distance refers to the distance between the space point farthest from the image acquisition device and the image acquisition device. The value of the weight value is typically between 0 and 1. Of course, if the accuracy of the projection error is high, the weighted value of the map point may be set to 1.

Step 1062, optimizing the first pose according to the weight value of the map point and the projection error of the map point to obtain an optimized first pose.

And correcting the projection error by using the weight value so as to ensure that the corrected projection error can more accurately reflect the difference between the first pose and the actual pose, and therefore, constructing a projection error optimization function according to the weight value and the projection error, wherein the projection error optimization function indicates the sum of the projection errors corrected by the weight value. In particular, the projection error optimization function C_chfThe expression of (a) is as follows:

wherein, P_kRepresenting a first pose of the current frame image; x represents a map point; ρ represents a weight value; m_kRepresenting a region of visible space; pi represents a projection; c_k(π(P_kX)) represents basing the map point X on the first pose P_kProjection errors projected into the current frame image.

It should be noted that, when the current frame image is known as the first frame image, the first pose is data after the sensor is fused, and then the first pose is optimized according to the projection error optimization function determined by the weight value and the projection error, so as to determine the optimized pose corresponding to the current frame image.

In the embodiment, the projection error is corrected by determining the weighted value of the map point, so that the corrected projection error can more accurately reflect the difference between the first pose and the actual pose, and thus, the pose can be more accurately optimized to obtain a more accurate optimized pose.

Fig. 6 shows a flowchart of a pose optimization method according to another exemplary embodiment of the present application.

As shown in fig. 6, in another exemplary embodiment of the present application, on the basis of steps 101 to 105 shown in fig. 1, at least the following steps are further included:

step 601, performing feature matching on the first feature point of the current frame image and the second feature point of the previous frame image to determine a matching feature point, where the matching feature point corresponds to the same semantic category information.

For adjacent frame images shot when the image acquisition equipment moves, the changes of the positions and the postures of a plurality of static objects contained in the adjacent frame images are the same as the changes of the positions and the postures of the image acquisition equipment, namely the poses of the image acquisition equipment can be roughly estimated by utilizing the matching feature points of the adjacent frame images.

Specifically, semantic segmentation is performed on a current frame image to determine a first feature point of the current frame image, semantic segmentation is performed on a previous frame image to determine a second feature point of the previous frame image, the first feature point and the second feature point both carry semantic category information, feature matching is performed on the current frame image and the previous frame image to determine matching feature points of adjacent frame images, the semantic category information carried by the matching feature points is the same, and meanwhile, matching feature points with larger errors are filtered out from the obtained matching feature points to ensure that the matching feature points have larger reference values, i.e., the accuracy of the matching feature points is relatively higher.

It should be noted that the feature points in the image refer to pixel points in a representative and higher-recognition region in the image, and the feature points have accurate positions, definite object attributes and object meanings in the image, and usually include key points (keypoints) and descriptors (descriptors). The key points refer to the positions of the feature points in the image, and some feature points also have direction and scale information; a descriptor is typically a vector that describes the information of the pixels around a keypoint in an artificially designed way. In the prior art, as long as the distance between the descriptors of two feature points in the vector space is close, the two feature points can be considered as the same feature point, and in order to further improve the accuracy of feature matching, the distance between the descriptors of the two feature points in the vector space is close, and the semantic categories of the two feature points are the same, the two feature points can be considered as the same feature point, namely, the matched feature point, so that the accuracy of matching the feature points is ensured.

Step 602, determining a basis matrix according to the first pose, the second pose of the previous frame image and preset camera parameters.

The method includes the steps that the same camera is used for shooting the same object at different positions, the object in two images has an overlapped part, then theoretically, the two images have a certain corresponding relation, for the two images, connecting lines of ray centers (the ray centers refer to the intersection points of rays when the images are imaged) corresponding to the two images and map points can form an epipolar plane, the line of the epipolar plane and the intersection of the two images is an epipolar line, and the process is only related to matched pixel points in the images. In summary, for two images corresponding to the matching feature point, an epipolar line is corresponding to any one pixel point on one image on the other image, and the basic matrix indicates the mapping relationship between any one pixel point on one image and the epipolar line on the other image.

Specifically, a relative translation matrix and a relative rotation matrix of the image acquisition device can be determined through the first pose and the second pose, and then a basic matrix can be determined according to the relative rotation matrix, the relative translation matrix and preset camera internal parameters. The expression of the basis matrix F is as follows:

F＝K^-Tt×RK^-1

wherein t represents a relative translationA matrix; r represents a relative rotation matrix; k^-TRepresenting a transposition of a reciprocal of a preset camera internal parameter; k^-1The reciprocal of the camera internal parameter is preset.

Step 603, determining epipolar constraint errors according to the basic matrix and the matched feature points.

The epipolar constraint error indicates the sum of function values of all the matching feature points substituted into the epipolar constraint expression, when the matching feature points meet the epipolar constraint, the function value of the epipolar constraint expression is equal to 0, and when the matching feature points do not meet the epipolar constraint, the function value of the epipolar constraint expression is not equal to 0. Considering that the matching feature points are unchanged and have higher reference value, the smaller the epipolar constraint error is, the higher the accuracy of the basic matrix can be stated, that is, the higher the accuracy of the optimized first pose is, and therefore, different epipolar constraint errors correspond to different first poses. Epipolar constraint error C_cpiThe expression of (a) is as follows:

wherein, P_kRepresenting a first pose of the current frame image; p_k-1A second pose representing a previous frame image;a transpose representing pixel coordinates of a feature point of the i-th matching feature point in the previous frame image; x is the number of_i,kRepresenting pixel coordinates of the feature points in the current frame image of the ith matched feature point; f denotes a basis matrix.

And 604, optimizing the first pose according to the projection error and the epipolar constraint error to obtain an optimized first pose.

The pose is optimized by only depending on the projection error, and the situation of excessive convergence often occurs, so that the accuracy of the optimized pose is relatively low. In order to solve the situation, epipolar constraint of matched feature points needs to be added, so that the situation of excessive convergence is avoided, and the accuracy of the optimized pose is improved.

In the above description, it is known that, when the current frame image is the first frame image, the initial pose is optimized by the projection error, and the optimized pose corresponding to the current frame image is determined, and when the current frame image is the second frame image, the first pose corresponding to the second frame image is optimized by using the projection error and the epipolar constraint error in this embodiment, and the optimized first pose corresponding to the second frame image is determined.

Specifically, an error optimization function is constructed according to the projection error and the epipolar constraint error, when the first pose changes, the projection error and the epipolar constraint error also change, correspondingly, the error optimization function also changes, and the error optimization function is minimized, so that the first pose is optimized, and the pose estimation accuracy is improved. The expression of the error optimization function C is as follows:

The projection error correction coefficient is mainly used for correcting the projection error so as to ensure the accuracy of the corrected projection error, and when the accuracy of the projection error is higher, the projection error correction coefficient can be 1, obviously, the projection error correction coefficient can be the weighted value of the known map point, and of course, the projection error correction coefficient can also be set differently according to a specific scene, so that the projection error can accurately reflect the difference between the first pose and the actual pose.

In the embodiment, the epipolar constraint error of the matched feature points is determined, and the pose is optimized by using the epipolar constraint error and the projection error, so that the possibility of excessive convergence of the projection error is reduced, and the accuracy of the optimized pose is ensured.

Fig. 7 illustrates a pose optimization method provided in another exemplary embodiment of the present application.

As shown in fig. 7, in another exemplary embodiment of the present application, on the basis of steps 101 to 105 shown in fig. 1 and steps 601 to 603 shown in fig. 6, at least the following steps are further included:

step 701, determining, by an absolute scale sensor, a first translation increment of the motion of the image acquisition device during acquisition of the current frame image and the previous frame image.

The absolute scale sensor is a sensor capable of measuring the actual moving distance of the movable equipment during movement, and comprises an Inertial Measurement Unit (IMU) and a chassis control sensor, wherein the Inertial measurement unit is a device for measuring the three-axis attitude angle (or angular rate) and acceleration of the movable equipment, and the Inertial measurement unit has accumulated errors after a long time, but the translation increment in a short time is relatively accurate. The chassis control sensor is used for measuring the motion information of the movable equipment. When the movable equipment is only provided with the chassis control sensor, the translational increment of motion between the current frame image and the previous frame image collected by the chassis control sensor is a first translational increment; when the movable equipment is provided with the chassis control sensor and the inertia measurement unit, the translation increment of the motion during the current frame image and the previous frame image collected by the chassis control sensor and the inertia measurement unit is fused, and the fused translation increment is determined as a first translation increment. The first translation increment indicates a distance value moved by the image capturing apparatus while moving during capturing of the previous frame image and the current frame image, the distance value including a movement value corresponding on a horizontal axis, a vertical axis, and a vertical axis. It should be noted that the absolute scale sensor measures motion data, and the motion data can be obtained as pose increment.

Step 702, determining a second translation increment between the first pose and the second pose.

The second translation increment comprises a difference value of the coordinate value of the first position on the horizontal axis and the coordinate value of the second position on the horizontal axis, a difference value of the coordinate value of the first position on the vertical axis and the coordinate value of the second position on the vertical axis, and a difference value of the coordinate value of the first position on the vertical axis and the coordinate value of the second position on the vertical axis.

And 703, optimizing the first pose according to the projection error, the epipolar constraint error and the difference value between the first translation increment and the second translation increment to obtain an optimized first pose.

The epipolar constraint is a constraint where the equation is zero, and thus the epipolar constraint is still satisfied after multiplying the equation by a non-zero constant, which is called the scale equivalence of the essential matrix, which refers to the product of the relative rotation matrix and the relative translation matrix. Because the translation matrix and the rotation matrix respectively have three degrees of freedom, the degrees of freedom specifically refer to independent parameters for describing an object, the intrinsic matrix has six degrees of freedom, but due to the fact that the dimension equivalence of the intrinsic matrix is related to relative transformation between camera internal parameters and two images and has no relation with absolute positions of the images, the epipolar constraint cannot limit the actual distance between the two images, the intrinsic matrix actually has 5 degrees of freedom, the constraint of the difference value of the first translation increment and the second translation increment is increased, the pose constraint has 6 degrees of freedom, and therefore the accuracy of the optimized pose is improved.

The difference between the first translation increment and the second translation increment indicates an error between a distance value moved by the image capture device during capture of the current frame image and the previous frame image, the distance value moved by the image capture device corresponding to the first pose to the image capture device corresponding to the second pose.

And when the current frame image is a second frame image, the first pose of the second frame image is the sum of the optimized pose of the first frame image and the pose increment acquired by the absolute scale sensor, and then the first pose is optimized by using the projection error, the epipolar constraint error and the difference value of the first translation increment and the second translation increment to determine the more accurate optimized pose. And after the second frame image, the first pose of the current frame image is the sum of the optimized pose of the previous frame image and the pose increment acquired by the absolute scale sensor.

According to the embodiment of the invention, the constraint of the difference value between the first translation increment and the second translation increment is added on the basis of the projection error and the epipolar constraint error, so that the pose constraint has 6 degrees of freedom, and a more accurate and optimized pose is obtained.

Fig. 8 is a flowchart illustrating a step of optimizing the first pose according to the projection error, the epipolar constraint error, and the difference between the first shift increment and the second shift increment to obtain an optimized first pose in the embodiment shown in fig. 7.

As shown in fig. 8, based on the embodiment shown in fig. 7, in another exemplary embodiment of the present application, the step 703 of optimizing the first pose according to the projection error, the epipolar constraint error, and the difference between the first translation increment and the second translation increment to obtain an optimized first pose may specifically include the following steps:

step 7031, an objective function is determined based on the projection error, the epipolar constraint error, and a difference between the first shift increment and the second shift increment.

In this embodiment, the expression of the objective function E may be:

where σ denotes a projection error correction coefficient; x represents a map point; m_kRepresenting a region of visible space; pi represents a projection; c_k(π(P_kX)) represents basing the map point X on the first pose P_kProjection error projected into the current frame image; c_cpiRepresenting epipolar constraint errors; p_kRepresenting a first pose of the current frame image; p_k-1A second pose representing a previous frame image; d_kRepresenting a first translation increment; t is t_kRepresents a first pose P_kAnd a second position P_k-1A second translation increment therebetween; | d |_k-t_k(P_k-1，P_k) | | represents a two-norm of the difference between the first translation increment and the second translation increment; p is a radical of₃Representing a weighted value corresponding to the projection error; p is a radical of₄Representing a weighted value corresponding to the epipolar constraint error; p is a radical of₅A weighting value representing a difference between the first translation increment and the second translation increment. It should be noted that the objective function is different from the error optimization function, and therefore, the weighting values corresponding to the projection error and the epipolar constraint error may also be changed.

The solution method of the objective function E may be selected according to the requirement, and may include, for example, gauss newton method or levenberg-marquardt method.

Step 7032, adjusting the first pose to adjust the objective function.

For the objective function E, the projection error, the epipolar constraint error and the difference value between the first translation increment and the second translation increment are all independent variables, and the value of the objective function E can be changed by adjusting the first pose. Specifically, when the first pose changes, the projection error, the basis matrix, and the second translation increment obtained by projecting each map point also change, and accordingly, the projection error, the epipolar constraint error, and the difference between the first translation increment and the second translation increment also change, so that the objective function E also changes correspondingly.

Step 7033, determining the first pose when the objective function meets the preset condition as the optimized first pose.

In this embodiment, the preset condition is that the objective function E takes the minimum value E_min. As described in step 6032 above, the value of the objective function E may be changed by adjusting the first pose. When the objective function E takes the minimum value, it means that the sum of the projection error, the epipolar constraint error, and the difference between the first translation increment and the second translation increment is the minimum, and the obtained first pose is the optimal value at this time.

In this embodiment, by establishing the objective function and adjusting the first pose, the value of the objective function is correspondingly adjusted, so that the first pose when the objective function meets the preset condition can be obtained, the first pose is optimized, and a more accurate optimized first pose can be obtained.

Exemplary devices

Based on the same conception as the method embodiment, the embodiment of the application also provides a pose optimization device.

Fig. 9 is a schematic structural diagram of a pose optimization apparatus according to an exemplary embodiment of the present application.

As shown in fig. 9, an exemplary embodiment of the present application provides a pose optimization determination apparatus, including:

an obtaining module 901, configured to obtain a first pose corresponding to the image acquisition device when acquiring the current frame image;

a region determining module 902, configured to determine a visible space region corresponding to the first pose;

a map point determining module 903, configured to determine a spatial coordinate of a map point falling into the visible space region in a vector map;

an information determining module 904, configured to determine feature information of the current frame image;

a first error determining module 905, configured to determine a projection error of the map point in the current frame image according to the first pose, the spatial coordinates of the map point, and the feature information;

a first optimization module 906, configured to optimize the first pose according to the projection error to obtain an optimized first pose.

Fig. 10 is a schematic structural diagram of a pose optimization apparatus according to another exemplary embodiment of the present application.

As shown in fig. 10, in another exemplary embodiment, the characteristic information includes contour information of the first target object, and the first error determination module 905 includes:

the projection unit 9051 is configured to project the spatial coordinates of the map point to the current frame image through the first pose so as to determine the projection pixel coordinates of the map point;

and an error determining unit 9052, configured to determine a projection error according to the projection pixel coordinate and the contour information.

Fig. 11 is a schematic structural diagram of a projection unit 9051 in the pose optimization apparatus according to another exemplary embodiment of the present application.

As shown in fig. 11, in another exemplary embodiment, the feature information further includes pixel points of a second target object, and the projection unit 9051 includes:

a projection subunit 90511, configured to project, through the first pose, the spatial coordinates of the map point into the current frame image to obtain a projection point;

a mask subunit 90512, configured to determine a mask area according to the pixel point of the second target object;

a coordinate determination subunit 90513, configured to determine the projection pixel coordinates according to the projection point and the mask area.

As shown in fig. 10, in another exemplary embodiment, the first optimization module 906 includes:

a weight value determining unit 9061, configured to determine a weight value of the map point according to a distance between the map point and the image acquisition device;

a first optimizing unit 9062, configured to optimize the first pose according to the weight value of the map point and the projection error of the map point, so as to obtain an optimized first pose.

Fig. 12 is a schematic structural diagram of a pose optimization apparatus according to still another exemplary embodiment of the present application.

As shown in fig. 12, in a further exemplary embodiment, on the basis of the obtaining module 901, the area determining module 902, the map point determining module 903, the information determining module 904, and the first error determining module 905, the method further includes:

a matching module 1201, configured to perform feature matching on a first feature point of the current frame image and a second feature point of a previous frame image to determine a matching feature point, where the matching feature point corresponds to the same semantic category information;

a matrix determination module 1202, configured to determine a base matrix according to the first pose, a second pose of the previous frame image, and a preset camera parameter;

a second error determining module 1203, configured to determine an epipolar constraint error according to the basic matrix and the matching feature points;

a second optimization module 1204, configured to optimize the first pose according to the projection error and the epipolar constraint error, so as to obtain an optimized first pose.

Fig. 13 is a schematic structural diagram of a pose optimization apparatus according to still another exemplary embodiment of the present application.

As shown in fig. 13, in yet another exemplary embodiment, on the basis of the obtaining module 901, the area determining module 902, the map point determining module 903, the information determining module 904, the first error determining module 905, the matching module 1201, the matrix determining module 1202, and the second error determining module 1203, the method further includes:

a first increment determination module 1301, configured to determine, through an absolute scale sensor, a first translation increment that the image acquisition device moves during the acquisition of the current frame image and the previous frame image;

a second increment determination module 1302 for determining a second increment of translation between the first pose and the second pose;

and a third optimization module 1303, configured to optimize the first pose according to the projection error, the epipolar constraint error, and a difference between the first translation increment and the second translation increment, so as to obtain an optimized first pose.

Fig. 14 is a schematic flowchart of a third optimization module 1303 in the pose optimization apparatus according to still another exemplary embodiment of the present application.

In yet another exemplary embodiment, as shown in fig. 14, the third optimizing module 1303 includes:

a function determining unit 13031, configured to determine an objective function based on the projection error, the epipolar constraint error, and a difference between the first translation increment and the second translation increment;

an adjusting unit 13032, configured to adjust the first pose to adjust the objective function;

a second optimizing unit 13033, configured to determine that the first pose when the objective function meets a preset condition is the optimized first pose.

Exemplary electronic device

FIG. 15 illustrates a block diagram of an electronic device in accordance with an embodiment of the present application.

As shown in fig. 15, the electronic device 150 includes one or more processors 151 and memory 152.

Processor 151 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in electronic device 150 to perform desired functions.

Memory 152 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 151 to implement the pose optimization methods of the various embodiments of the present application described above and/or other desired functions.

In one example, the electronic device 150 may further include: an input device 153 and an output device 154, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

Of course, for simplicity, only some of the components of the electronic device 150 relevant to the present application are shown in fig. 15, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 150 may include any other suitable components, depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the pose optimization method according to various embodiments of the present application described in the "exemplary methods" section of this specification, above.

The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the pose optimization method according to various embodiments of the present application described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

30页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：使用机器学习的组合的室内和室外跟踪

Pose optimization method and device

相关技术

网友询问留言