The training method and device of camera positioning and neural network

文档序号：1771884 发布日期：2019-12-03 浏览：40次中文

阅读说明：本技术 相机定位及神经网络的训练方法和装置 (The training method and device of camera positioning and neural network ) 是由丁明宇王哲石建萍于 2019-08-30 设计创作，主要内容包括：本公开实施例提供一种相机定位及神经网络的训练方法和装置,其中方法包括：在图像数据库中检索得到查询图像的初始配对图像；图像数据库中的图像对应的相机绝对位姿已知；获取初始配对图像与查询图像的预测相机相对位姿；根据预测相机相对位姿确定查询图像的相机估计位姿；根据查询图像的相机估计位姿,由图像数据库检索得到查询图像的新配对图像；预测新配对图像和查询图像的相机相对位姿；基于新配对图像和查询图像的相机相对位姿以及新配对图像的相机绝对位姿,确定查询图像的相机绝对位姿。本公开提高了相机定位的准确性。(The embodiment of the present disclosure provides the training method and device of a kind of positioning of camera and neural network, wherein method include: in image data base retrieval obtain the initial pairing image of query image；Known to the absolute pose of the corresponding camera of image in image data base；Obtain the prediction camera relative pose of initial pairing image and query image；Determine that the camera of query image estimates pose according to prediction camera relative pose；Pose is estimated according to the camera of query image, and the new pairing image of query image is obtained by Image-Database Retrieval；The camera relative pose of the new pairing image and query image of prediction；The absolute pose of camera of camera relative pose and new pairing image based on new pairing image and query image, determines the absolute pose of the camera of query image.The disclosure improves the accuracy of camera positioning.)

1. a kind of camera localization method, which is characterized in that the described method includes:

Retrieval obtains the initial pairing image of query image in image data base；Image in described image database is corresponding Known to the absolute pose of camera；

Obtain the prediction camera relative pose of initial the pairing image and query image；According to the prediction camera with respect to position Appearance determines the camera estimation pose of the query image；

Pose is estimated according to the camera of the query image, and newly matching for the query image is obtained by described image database retrieval To image；

Predict the camera relative pose of new the pairing image and query image；

The absolute position of camera of camera relative pose and the new pairing image based on the new pairing image and query image Appearance determines the absolute pose of the camera of the query image.

2. the method according to claim 1, wherein the camera localization method is executed by camera positioning device, The camera positioning device includes camera positioning neural network；

The camera positioning neural network includes: that shared sub-network, coarse search sub-network, examining large rope network and relative pose return Return sub-network；The shared sub-network returns subnet with the coarse search sub-network, examining large rope network and relative pose respectively Network connection；

The query image, the initial pairing image and the new pairing image first pass through the shared sub-network and carry out The processing for extracting characteristics of image respectively obtains shared treated image；

It is described before retrieval obtains the initial pairing image of query image in image data base, the method also includes: it is shared Processing of the query image Jing Guo coarse search sub-network that treated, obtains the characteristics of image of the query image, with basis Image is initially matched described in described image characteristic key；

The prediction camera relative pose for obtaining initial the pairing image and query image, comprising: shared treated institute Both it states initial pairing image and query image passes through the processing of the examining large rope network, exported by the examining large rope network Prediction camera relative pose；

The camera relative pose of prediction new the pairing image and query image, comprising: shared treated described newly matches The processing for returning sub-network by the relative pose to image and the query image, returns sub-network by the relative pose Export the camera relative pose of the two.

3. a kind of training method of camera positioning neural network, which is characterized in that the described method includes:

Obtain multiple series of images pair, each group of image is to including a query image and a pairing image, and the pairing image It is respectively provided with the corresponding absolute pose of camera with query image, described image is to also with the markup information of relative pose；

Neural network is positioned for any image pair by camera, is predicted the query image and is matched the camera phase between image To pose；

The absolute pose of camera based on the relative pose and pairing image determines the camera estimation pose of the query image；

Pose is estimated according to the camera of the query image, by the new pairing figure for retrieving the query image in image data base Picture, the new pairing image and the query image constitute new images pair；

The query image position opposite with the new pairing camera of image for positioning new images centering described in neural network prediction by camera Appearance；

Difference between predictive information and markup information based on the camera relative pose adjusts the camera positioning nerve net The network parameter of network.

4. according to the method described in claim 3, it is characterized in that,

The acquisition multiple series of images is to later, and the method also includes shared treated the query image and pairing images By the processing of coarse search sub-network, obtains the query image and match the images relations parameter between image；

Shared treated the pairing image and query image pass through the processing of the examining large rope network, by the examining rope The camera relative pose of both sub-network output；

Shared treated the new pairing image and the query image return the processing of sub-network by the relative pose, The camera relative pose of both sub-network output is returned by the relative pose；

The network parameter of the adjustment camera positioning neural network, comprising: believed according to the prediction of described image Relation Parameters Breath and markup information difference, the examining large rope network output prediction camera relative pose and markup information difference, with And the relative pose returns the predictive information of the camera relative pose of sub-network output and the difference of markup information, described in adjustment Shared sub-network, coarse search sub-network, examining large rope network and relative pose return the network parameter of sub-network.

5. according to the method described in claim 4, it is characterized in that, described obtain between the query image and pairing image Images relations parameter, comprising:

According to the rotation pose of the query image of one group of image pair and pairing image, determines the query image of prediction and match To the relative angular offset amount of the camera pose of image as described image Relation Parameters.

6. according to the method described in claim 4, it is characterized in that, described obtain between the query image and pairing image Images relations parameter, comprising:

The corresponding multiple images of same query image are grouped to according to the complexity for returning relative pose；

The characteristics of image distance of each image pair in different grouping is obtained respectively；

According to described image characteristic distance, the predicted value that difficult sample excavates loss is obtained, the hardly possible sample excavates loss and is used for table Show the relationship between the arbitrary image characteristic distance in different grouping.

7. a kind of camera positioning device, which is characterized in that described device includes:

Initial retrieval module obtains the initial pairing image of query image for retrieving in image data base；Described image number According to known to the absolute pose of the corresponding camera of image in library；

Initial predicted module, for obtaining the prediction camera relative pose of initial the pairing image and query image；According to institute Prediction camera relative pose is stated, determines the camera estimation pose of the query image；

Again retrieval module is obtained for estimating pose according to the camera of the query image by described image database retrieval The new pairing image of the query image；

Again prediction module, for predicting the camera relative pose of new the pairing image and query image；

Determining module is positioned, described is matched for the camera relative pose based on the new pairing image and query image and newly To the absolute pose of the camera of image, the absolute pose of the camera of the query image is determined.

8. a kind of training device of camera positioning neural network, which is characterized in that described device includes:

Image obtains module, and for obtaining multiple series of images pair, each group of image is to including a query image and a pairing figure Picture, and the pairing image and query image are respectively provided with the corresponding absolute pose of camera, described image is to also with opposite position The markup information of appearance；

Opposite prediction module is predicted the query image and is matched for positioning neural network for any image pair by camera To the camera relative pose between image；

Estimate pose module, for the absolute pose of camera based on the relative pose and pairing image, determines the query graph The camera of picture estimates pose；

New images module, for estimating pose according to the camera of the query image, by retrieving the inquiry in image data base The new pairing image of image, the new pairing image and the query image constitute new images pair；

Pose prediction module, for positioning the query image and new pairing of new images centering described in neural network prediction by camera The camera relative pose of image；

Parameter adjustment module, for the difference between predictive information and markup information based on the camera relative pose, adjustment The network parameter of the camera positioning neural network.

9. a kind of electronic equipment, which is characterized in that the equipment includes memory, processor, and the memory can for storing The computer instruction run on a processor, the processor are used to realize claim 1 when executing the computer instruction Or method described in 2, or realize any method of claim 3 to 6.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed Device realizes method of any of claims 1 or 2 when executing, or realizes any method of claim 3 to 6.

Technical field

This disclosure relates to machine learning techniques, and in particular to the training method and device of camera positioning and neural network.

Background technique

Too busy to get away map and automobile as the improvement of people's living standards, people go on a journey.Nearest computer vision Development provides huge convenience to live, no matter digital map navigation or automobile, require the image taken by camera come It is positioned.In addition, the navigation of robot also be unable to do without the camera localization method of view-based access control model, therefore camera positioning is computer Important work in visual field.It can apply on the multi-tasks such as automatic Pilot and robot and digital map navigation and have There is important meaning.

In traditional camera localization method, there is a method of relative positioning, i.e., according in two images relative pose and its In an image the absolute pose of camera, the absolute pose of camera to obtain another image.But current this relative positioning Positioning accuracy it is lower.

Summary of the invention

In view of this, the disclosure at least provides the training method and device of a kind of positioning of camera and neural network, to improve The precision of camera positioning.

In a first aspect, providing a kind of camera localization method, which comprises

Retrieval obtains the initial pairing image of query image in image data base；Image pair in described image database Known to the absolute pose of the camera answered；

Obtain the prediction camera relative pose of initial the pairing image and query image；It is opposite according to the prediction camera Pose determines the camera estimation pose of the query image；

Pose is estimated according to the camera of the query image, and the query image is obtained by described image database retrieval New pairing image；

Predict the camera relative pose of new the pairing image and query image；

The camera of camera relative pose and the new pairing image based on the new pairing image and query image is exhausted To pose, the absolute pose of the camera of the query image is determined.

In some embodiments, it is described in image data base retrieval obtain the initial pairing image of query image before, The method also includes: known geographic database is obtained, the known geographic database includes more known to the absolute pose of camera Open image；By choosing the corresponding image of predetermined geographic in the known geographic database, described image database is constructed.

In some embodiments, it is described in image data base retrieval obtain the initial pairing image of query image before, The method also includes: the acquisition intelligent terminal by being provided with acquisition camera acquires multiple images in predetermined geographic；Really Surely the absolute pose of the corresponding camera of each image acquired；Multiple images according to acquisition and its absolute pose of camera, Construct the corresponding image data base of the predetermined geographic.

In some embodiments, the corresponding predetermined geographic of each image of described image database, is following any sort The region of type: digital map navigation region, intelligent driving localization region or robot navigation region.

In some embodiments, the retrieval in image data base obtains the initial pairing image of query image, comprising: Receive the query image of pending camera positioning；Extract the characteristics of image of the query image；According to the query image Characteristics of image, by described image database retrieve obtain the initial pairing image of the query image.

In some embodiments, the camera localization method is executed by camera positioning device, the camera positioning device packet Include camera positioning neural network；The camera positioning neural network includes: shared sub-network, coarse search sub-network, examining large rope Network and relative pose return sub-network；The shared sub-network respectively with the coarse search sub-network, examining large rope network and Relative pose returns sub-network connection；The query image, the initial pairing image and the new pairing image, first pass through The shared sub-network extracts the processing of characteristics of image, respectively obtains shared treated image；It is described in image data Before retrieval obtains the initial pairing image of query image in library, the method also includes: shared treated the query graph As the processing Jing Guo coarse search sub-network, the characteristics of image of the query image is obtained, according to described image characteristic key institute State initial pairing image；The prediction camera relative pose for obtaining initial the pairing image and query image, comprising: shared Initial image and the query image of matching that treated is by the processing of the examining large rope network, by the examining large rope net The prediction camera relative pose of both network output；The camera relative pose of prediction new the pairing image and query image, It include: the place that shared treated the new pairing image and the query image return sub-network by the relative pose Reason is returned the camera relative pose of both sub-network output by the relative pose.

In some embodiments, it includes decoding network part and Recurrent networks part that the relative pose, which returns sub-network,； Shared treated the new pairing image and the query image return the processing of sub-network by the relative pose, The camera relative pose of both sub-network output is returned by the relative pose, comprising: shared treated the new pairing figure As the image pair with query image, inputs the relative pose and return sub-network, returned in sub-network by the relative pose Decoding network part processing after, obtain the characteristics of image of every image of image pair；By the new pairing image and inquiry The characteristics of image of image splices, and obtains splicing feature；The splicing feature returns the recurrence of sub-network by the relative pose After the processing of network portion, the query image of prediction and the camera relative pose of new pairing image are exported.

In some embodiments, the examining large rope network includes decoding network part and Recurrent networks part；It is described total The initial pairing image of enjoying that treated and query image pass through the processing of the examining large rope network, by the examining large rope The prediction camera relative pose of both network output, comprising: shared treated the initial pairing image and query image Image pair inputs the examining large rope network, after the processing of the decoding network part in the examining large rope network, obtains The characteristics of image of every image of image pair；The characteristics of image of the initial pairing image and query image is spliced, is spelled Connect feature；The splicing feature exports the inquiry of prediction after the processing of the Recurrent networks part of the examining large rope network The prediction camera relative pose of image and initial pairing image.

In some embodiments, in shared treated the processing of the query image Jing Guo coarse search sub-network, Before obtaining the characteristics of image of the query image, the method also includes: using in the camera positioning neural network of pre-training Shared sub-network and coarse search sub-network, to every image zooming-out characteristics of image in described image database；To described every Image labeling described image feature is opened, so that carrying out image retrieval according to described image feature.

In some embodiments, the retrieval in image data base obtains the initial pairing image of the query image, It include: that retrieval obtains multiple initial pairing images of the query image, it is more to be obtained according to the multiple initial pairing image A new pairing image；The absolute pose of camera of the determination query image, comprising: newly matched according to the multiple respectively The absolute pose of camera of query image is obtained to each new image that matches of image, and according to multiple absolute poses of camera Obtain the absolute pose of camera of the query image.

Second aspect provides a kind of training method of camera positioning neural network, which comprises

Obtain multiple series of images pair, each group of image is to including a query image and a pairing image, and the pairing Image and query image are respectively provided with the corresponding absolute pose of camera, and described image is to also with the markup information of relative pose；

Neural network is positioned for any image pair by camera, is predicted the query image and is matched the phase between image Machine relative pose；The absolute pose of camera based on the relative pose and pairing image, determines that the camera of the query image is estimated Count pose；

Pose is estimated according to the camera of the query image, by the new pairing for retrieving the query image in image data base Image, the new pairing image and the query image constitute new images pair；

The query image and the camera phase of new pairing image of new images centering described in neural network prediction are positioned by camera To pose；Difference between predictive information and markup information based on the camera relative pose adjusts the camera positioning mind Network parameter through network.

The query image, the initial pairing image and the new pairing image, first pass through the shared sub-network The processing for extracting characteristics of image respectively obtains shared treated image；The acquisition multiple series of images is to later, the side Method further include: shared treated the processing of the query image and pairing image Jing Guo coarse search sub-network, obtain described in look into It askes image and matches the images relations parameter between image；Described in shared treated the pairing image and query image process The processing of examining large rope network, by the prediction camera relative pose of both examining large rope network output；It is shared that treated The new pairing image and the query image return the processing of sub-network by the relative pose, are returned by the relative pose Return the camera relative pose of both sub-network output；The network parameter of the adjustment camera positioning neural network, comprising: root The camera exported according to the predictive information and the difference of markup information, the examining large rope network of described image Relation Parameters is with respect to position The predictive information of appearance and the difference of markup information and the relative pose return the pre- of the camera relative pose of sub-network output The difference of measurement information and markup information adjusts the shared sub-network, coarse search sub-network, examining large rope network and relative pose Return the network parameter of sub-network.

It is in some embodiments, described to obtain the query image and match the images relations parameter between image, comprising: According to the rotation pose of the query image of one group of image pair and pairing image, the query image and pairing figure of prediction are determined The relative angular offset amount of the camera pose of picture is as described image Relation Parameters.

It is in some embodiments, described to obtain the query image and match the images relations parameter between image, comprising: The corresponding multiple images of same query image are grouped to according to the complexity for returning relative pose；It obtains respectively different The characteristics of image distance of each image pair in grouping；According to described image characteristic distance, the predicted value that difficult sample excavates loss is obtained, The hardly possible sample excavates relationship of the loss for indicating between the arbitrary image characteristic distance in different grouping.

The third aspect, provides a kind of camera positioning device, and described device includes:

Initial retrieval module obtains the initial pairing image of query image for retrieving in image data base；The figure As known to the absolute pose of the corresponding camera of the image in database；

Initial predicted module, for obtaining the prediction camera relative pose of initial the pairing image and query image；Root According to the prediction camera relative pose, the camera estimation pose of the query image is determined；

Again retrieval module, for estimating pose according to the camera of the query image, by described image database retrieval Obtain the new pairing image of the query image；

Again prediction module, for predicting the camera relative pose of new the pairing image and query image；

Determining module is positioned, for based on the new pairing image and the camera relative pose of query image and described The absolute pose of camera of new pairing image, determines the absolute pose of the camera of the query image.

In some embodiments, described device further include: the first image collection module, for being retrieved in image data base Before obtaining the initial pairing image of query image, known geographic database is obtained, the known geographic database includes camera Multiple images known to absolute pose；By choosing the corresponding image of predetermined geographic, building in the known geographic database Described image database.

In some embodiments, described device further include: the second image collection module, for being retrieved in image data base Before obtaining the initial pairing image of query image, by being provided with the acquisition intelligent terminal of acquisition camera, in predetermined geographic area Domain acquires multiple images；Determine the absolute pose of the corresponding camera of each image of acquisition；Multiple images according to acquisition And its absolute pose of camera, construct the corresponding image data base of the predetermined geographic.

In some embodiments, the initial retrieval module, is specifically used for: receiving the inquiry of pending camera positioning Image；Extract the characteristics of image of the query image；According to the characteristics of image of the query image, by described image database Retrieval obtains the initial pairing image of the query image.

In some embodiments, the camera positioning device includes camera positioning neural network；The camera positioning nerve Network includes: that shared sub-network, coarse search sub-network, examining large rope network and relative pose return sub-network；The shared son Network returns sub-network with the coarse search sub-network, examining large rope network and relative pose respectively and connect；The shared subnet Network, for extracting characteristics of image respectively to the query image, the initial pairing image and the new pairing image Processing respectively obtains shared treated image；The initial retrieval module is obtaining initially matching for query image for retrieving Before image, be also used to: to shared treated processing of the query image Jing Guo coarse search sub-network, obtain described in look into The characteristics of image of image is ask, initially to match image according to described image characteristic key；The initial predicted module, with When obtaining the prediction camera relative pose of the initial pairing image and query image, comprising: that treated is described to shared Initial pairing image and query image pass through the processing of the examining large rope network, by both examining large rope network output Predict camera relative pose；Again the prediction module, for predicting the camera phase of new the pairing image and query image When to pose, comprising: to shared treated the new pairing image and the query image by relative pose recurrence The processing of sub-network is returned the camera relative pose of both sub-network output by the relative pose.

In some embodiments, it includes decoding network part and Recurrent networks part that the relative pose, which returns sub-network,； Again the prediction module, is specifically used for: to the image pair of shared treated new the pairing image and query image, input The relative pose returns sub-network, after the relative pose returns the processing of the decoding network part in sub-network, obtains To the characteristics of image of every image of image pair；The characteristics of image of the new pairing image and query image is spliced, is spelled Connect feature；The splicing feature is after the relative pose returns the processing of the Recurrent networks part of sub-network, output prediction Query image and new pairing image camera relative pose.

In some embodiments, the examining large rope network includes decoding network part and Recurrent networks part；It is described first Beginning prediction module, is specifically used for: to the image pair of shared treated initial the pairing image and query image, described in input Examining large rope network obtains image pair every figure after the processing of the decoding network part in the examining large rope network The characteristics of image of picture；The characteristics of image of the initial pairing image and query image is spliced, splicing feature is obtained；The splicing Feature exports the query image of prediction and initially matches image after the processing of the Recurrent networks part of examining large rope network Camera relative pose.

In some embodiments, the initial retrieval module, is also used to: in shared treated the query image By the processing of coarse search sub-network, before obtaining the characteristics of image of the query image, mind is positioned using the camera of pre-training Through the shared sub-network and coarse search sub-network in network, to every image zooming-out characteristics of image in described image database； To every image labeling described image feature, so that carrying out image retrieval according to described image feature.

In some embodiments, the retrieval in image data base obtains the initial pairing image of the query image, It include: that retrieval obtains multiple initial pairing images of the query image, it is more to be obtained according to the multiple initial pairing image A new pairing image；The positioning determining module, is specifically used for: respectively according to each of the multiple new pairing image New pairing image obtains the absolute pose of camera of query image, and obtains the query graph according to multiple absolute poses of camera The absolute pose of the camera of picture.

Fourth aspect, provides a kind of training device of camera positioning neural network, and described device includes:

Image obtains module, and for obtaining multiple series of images pair, each group of image is matched to including a query image and one To image, and the pairing image and query image are respectively provided with the corresponding absolute pose of camera, and described image is to also with phase To the markup information of pose；

Opposite prediction module predicts the query image for positioning neural network for any image pair by camera Camera relative pose between pairing image；

Estimate pose module, for the absolute pose of camera based on the relative pose and pairing image, is looked into described in determination The camera for asking image estimates pose；

New images module, for estimating pose according to the camera of the query image, as described in retrieval in image data base The new pairing image of query image, the new pairing image and the query image constitute new images pair；

Pose prediction module, for the query image by new images centering described in camera positioning neural network prediction and newly Match the camera relative pose of image；

Parameter adjustment module, for the difference between predictive information and markup information based on the camera relative pose, Adjust the network parameter of the camera positioning neural network.

In some embodiments, the camera positioning neural network includes: shared sub-network, coarse search sub-network, examining Large rope network and relative pose return sub-network；The shared sub-network respectively with the coarse search sub-network, examining large rope net Network returns sub-network with relative pose and connects；The shared sub-network, for the query image, the initial pairing image With the new pairing image, the processing that the shared sub-network extracts characteristics of image is first passed through, shared place is respectively obtained Image after reason；Described device further include: initial retrieval module, for scheming to shared treated the query image and pairing As the processing Jing Guo coarse search sub-network, obtains the query image and match the images relations parameter between image；The phase To prediction module, it is specifically used for: passes through the place of examining large rope network to shared treated the pairing image and query image Reason, by the predictive information of the relative pose of both examining large rope network output；The pose prediction module, is specifically used for: Shared treated the new pairing image and the query image return the processing of sub-network by the relative pose, by institute State the predictive information that relative pose returns the relative pose of both sub-network output；The parameter adjustment module, for adjusting When the network parameter of the camera positioning neural network, comprising: believed according to the predictive information of described image Relation Parameters and mark The difference of breath, the difference of the relative pose predictive information of examining large rope network output and markup information and described opposite Pose returns the relative pose predictive information of sub-network output and the difference of markup information, adjusts the shared sub-network, Rough Inspection Large rope network, examining large rope network and relative pose return the network parameter of sub-network.

In some embodiments, the initial retrieval module, for obtaining between the query image and pairing image Images relations parameter when, comprising: according to the query image of one group of image pair and pairing image rotation pose, determine prediction The query image and pairing image camera pose relative angular offset amount as described image Relation Parameters.

In some embodiments, the initial retrieval module, for obtaining between the query image and pairing image Images relations parameter when, comprising: by the corresponding multiple images of same query image to according to return relative pose difficulty or ease journey Degree is grouped；The characteristics of image distance of each image pair in different grouping is obtained respectively；According to described image characteristic distance, obtain Difficult sample excavates the predicted value of loss, and the hardly possible sample excavates loss for indicating the arbitrary image characteristic distance in different grouping Between relationship.

5th aspect provides a kind of electronic equipment, and the equipment includes memory, processor, and the memory is for depositing The computer instruction that can be run on a processor is stored up, the processor is for realizing the disclosure when executing the computer instruction Camera localization method described in any embodiment, or realize the positioning neural network of camera described in disclosure any embodiment Training method.

6th aspect, provides a kind of computer readable storage medium, is stored thereon with computer program, described program is located It manages and realizes camera localization method described in the disclosure any embodiment when device executes, or realize described in disclosure any embodiment Camera positioning neural network training method.

The training method and device of camera positioning and neural network that the embodiment of the present disclosure provides, by initially matching in acquisition The estimation pose of query image is obtained after image, and according to the initial pairing image, and is based on estimation pose and is retrieved, So that the new pairing image retrieved and the pose of query image are more nearly, and the lesser image of pose deviation is to being used to do phase Will be more accurate to pose prediction, and then camera positioning result is also more acurrate.

Detailed description of the invention

Technical solution in order to illustrate more clearly of disclosure one or more embodiment or in the related technology, below will be right Attached drawing needed in embodiment or description of Related Art is briefly described, it should be apparent that, be described below in it is attached Figure is only some embodiments recorded in disclosure one or more embodiment, for those of ordinary skill in the art, Without any creative labor, it is also possible to obtain other drawings based on these drawings.

Fig. 1 shows a kind of flow chart of camera localization method of at least one embodiment of disclosure offer；

Fig. 2 shows a kind of training processes for camera positioning neural network that at least one embodiment of the disclosure provides；

Fig. 3 shows the training process of another camera positioning neural network of at least one embodiment of disclosure offer；

Fig. 4 shows the signal of the visual angle registration of two images of at least one embodiment of disclosure offer；

Fig. 5 shows a kind of flow chart of camera localization method of at least one embodiment of disclosure offer；

The camera localization method provided using the disclosure that Fig. 6 shows the offer of at least one embodiment of the disclosure is determined The Contrast on effect signal of position result；

Fig. 7 shows a kind of schematic diagram of camera positioning device of at least one embodiment of disclosure offer；

Fig. 8 shows a kind of schematic diagram of camera positioning device of at least one embodiment of disclosure offer；

Fig. 9 shows a kind of schematic diagram of camera positioning device of at least one embodiment of disclosure offer；

Figure 10 shows a kind of schematic diagram of camera positioning device of at least one embodiment of disclosure offer.

Specific embodiment

In order to make those skilled in the art more fully understand the technical solution in disclosure one or more embodiment, under Face will combine disclosure one or more embodiment in attached drawing, to the technical solution in disclosure one or more embodiment into Row clearly and completely describes, it is clear that described embodiment is only disclosure a part of the embodiment, rather than whole realities Apply example.Based on disclosure one or more embodiment, those of ordinary skill in the art are without creative efforts The range of disclosure protection all should belong in every other embodiment obtained.

Assuming that currently there is an image, which shot by camera, for example, it may be by the camera being arranged in robot Shooting obtains, alternatively, being also possible to shoot to obtain by the camera being arranged on vehicle." the camera positioning " of the embodiment of the present disclosure exists The absolute pose for determining camera, which kind of position and posture camera is at such as from the point of view of world coordinate system or the camera coordinates system itself It is lower to shoot the obtained image.After camera positioning, also it may be implemented simultaneously to equipment where the cameras such as robot or vehicle Positioning.This is by way of a kind of image taken camera is positioned.

The camera localization method that the disclosure provides is the positioning method of a kind of " relative positioning ".For example, a certain camera shooting Image F1 and image F2, wherein this two images correspond to different camera poses, it is possible to which respective image is corresponding Camera pose be known as " the absolute pose of camera " of image, by the difference between the corresponding camera pose of two images or corresponding close System is known as " the camera relative pose " of two images.

The image F1 and image F2 of above-mentioned example, if to be positioned is the absolute pose of camera of image F1, image F2's The absolute pose of camera be it is known that the absolute pose of camera between two images it is also known that, then can be according to two images Between the absolute pose of camera and image F2 the absolute pose of camera, determination obtain the absolute pose of camera of image F1.This can be with Referred to as a kind of positioning method of " relative positioning ".The camera localization method of the disclosure will determine that the camera of a certain image is absolute Pose, and positioned according to above-mentioned relative positioning mode.By above-mentioned it can also be seen that being accurately obtained two images Between " camera relative pose ", to camera position it is particularly important.

Fig. 1 shows a kind of flow chart of camera localization method of at least one embodiment of disclosure offer, such as Fig. 1 institute Show, this method may include handling as follows:

In step 100, retrieval obtains the initial pairing image of query image in image data base；Described image data Known to the absolute pose of the corresponding camera of image in library.

It include multiple images of predetermined geographic in this step, in the image data base.For example, the predetermined geographic Region can be it is according to practical business product it needs to be determined that region, for example, in the scene of robot navigation, the reservation place Reason region can be scheduled robot navigation region；For another example, in automatic Pilot scene, which be can be Scheduled intelligent driving localization region；In other scenes, it can also be other regions such as digital map navigation region.The figure As multiple images in database are acquired in above-mentioned predetermined geographic.

The building of the image data base, can be there are many mode.It illustratively, can be by being provided with adopting for acquisition camera Collection intelligent terminal acquires multiple images in predetermined geographic, and determines the absolute position of the corresponding camera of each image of the acquisition Appearance；Multiple images according to acquisition and its absolute pose of camera, construct the corresponding image data base of the predetermined geographic. Or be also possible that by choosing the corresponding picture construction image data base of predetermined geographic in known geographic database, and And the absolute pose of camera of each image is known.

In this step, the query image, that is, pending camera positioning image.The image data base of above-mentioned acquisition can be with For retrieving the initial pairing image for obtaining query image, to be used for subsequent step according to the initial pairing image and query image It continues with and is positioned.

In actual use, for some geographic area, the corresponding image data base in the geographic area can be pre-established, and Determine that the absolute pose of the camera of every image is known.In this way, when the phase being arranged on vehicle or robot or other types equipment Machine collects image in the area, which can serve as query image, and retrieve picture number based on the query image Pairing image is obtained according to library, and then determines the absolute pose of camera of query image.How subsequent step will continue to describe according to inspection The initial pairing image memory camera positioning that rope arrives.

In a step 102, the prediction camera relative pose of initial the pairing image and query image is obtained；According to described It predicts camera relative pose, determines the camera estimation pose of the query image.

In this step, the absolute pose of camera for initially matching image and this image pair of query image can be predicted, it can be with Referred to as predict camera relative pose.After determining the prediction camera relative pose, wherein the absolute position of camera of initial pairing image Appearance be it is known that then according to the absolute pose of camera and the prediction camera relative pose of initial pairing image, it is available to look into The camera for asking image estimates pose.Camera estimation pose is equivalent to the absolute pose of query image of an estimation.

At step 104, pose is estimated according to the camera of the query image, institute is obtained by described image database retrieval State the new pairing image of query image；

In this step, again by retrieving the image for matching with query image in image data base.Due to this retrieval It is the camera estimation pose retrieval of the query image according to obtained in step 102, so that the new pairing image retrieved is in phase It will be more close with query image on the absolute pose of machine.

In step 106, the camera relative pose of new the pairing image and query image is predicted.

In this step, it can predict newly to match the camera relative pose of image and query image between the two.Due to newly matching Image is compared with initial pairing image, the new absolute pose of camera for matching image and query image is more closely, so newly match The camera relative pose predicted between the two image and query image is also more acurrate.

In step 108, camera relative pose and the new pairing based on new the pairing image and query image The absolute pose of the camera of image determines the absolute pose of the camera of the query image.

Wherein, the camera positioning flow that the embodiment of the present disclosure provides is relative to traditional camera positioning flow, increased portion It point include: that prediction relative pose is obtained according to initial pairing image and query image, and after obtaining the estimation pose of query image Again the new pairing image of retrieval, based on new pairing image prediction relative pose.And traditional approach is retrieving initial pairing image Later, just directly according to the absolute pose of camera of both initial pairing image and query image prediction, and according to the absolute position of camera The absolute pose of camera of appearance calculating query image.

That is, if the process for " retrieving the pairing image with query image pairing " is known as " retrieval phase ", that The camera localization method of the disclosure improves the retrieval phase, and using " two-phase retrieval method ", the first stage is It is carried out retrieving initial pairing image according to characteristics of image, second stage is according to the new pairing image of estimation pose retrieval.It is this new Retrieval mode by be then based on estimation pose retrieved, the pose more adjunction of the new pairing image retrieved and query image Closely, and the lesser image of pose deviation to be used to do camera absolute pose prediction will be more accurate, and then camera positioning result It is more acurrate.

The camera localization method of the present embodiment, by scheming after obtaining initial pairing image, and according to the initial pairing Estimation pose as obtaining query image, and retrieved based on estimation pose, so that the new pairing image retrieved and inquiry The pose of image is more nearly, and the lesser image of pose deviation to be used to do relative pose prediction will be more accurate, Jin Erxiang Machine positioning result is also more acurrate.

Camera localization method shown in FIG. 1, can be based on neural fusion, and the type of neural network may include but unlimited In convolutional neural networks, Recognition with Recurrent Neural Network or deep neural network etc..The disclosure can provide a kind of camera positioning nerve net Network executes above-mentioned camera localization method by the neural network.

It may include: for extracting the structure division of characteristics of image, using in the network structure of camera positioning neural network In the structure division by the camera relative pose of one image pair of regression forecasting, etc..For example, camera positioning neural network can To include: that shared sub-network, coarse search sub-network, examining large rope network and relative pose return sub-network.Every seed network can One or more network subelements (such as convolutional layer, non-linear layer, pond layer) based on building neural network are according to one Fixed mode stacks connection and obtains.Wherein, shared sub-network can respectively with the coarse search sub-network, examining large rope network and Relative pose returns sub-network connection, which belongs to coarse search sub-network, examining large rope network and relative pose and return Return the sub-network that sub-network is shared.

The shared sub-network can extract the processing of characteristics of image to image, and treated, and image can be by above-mentioned Coarse search sub-network, examining large rope network and relative pose return sub-network and handle respectively.For example, shared treated image The characteristics of image of image can be obtained, by coarse search sub-network to retrieve initial pairing image according to the characteristics of image.Compare again Such as, examining large rope neural network forecast can be passed through to (e.g., initially pairing image and query image) by sharing treated image Prediction relative pose between initial pairing image and query image；It for another example, can also will shared treated image The camera relative pose that sub-network predicts the image pair is returned by relative pose to (such as new pairing image and query image).Its In, the initial image or new pairing image of matching is before input examining large rope network or relative pose return sub-network, all The processing for first passing through the extraction characteristics of image of shared sub-network obtains shared treated correspondence image.

Different sub-network parts can carry out parameter in the network training stage in above-mentioned camera positioning neural network Training adjustment, to obtain preferably exporting result.

One camera of example is positioned the training process of neural network by Fig. 2:

The training of neural network it should be noted that in actual implementation, the training of neural network except following processing, It can also include the training of other network portions.

In step 200, the multiple series of images pair for training the neural network is obtained, each group of image is to including one Query image and a pairing image.

In this step, some images pair for being used to train neural network can be prepared, each image pair all includes two Image.Wherein it can will be known as " query image " (image of i.e. pending camera positioning) by an image, another is known as " pairing Image ".Query image is fixed and invariable (is referred to as anchor pictures) in the training process, and matches image and training It can be changed in journey, such as subsequent can be changed to newly match image.

In addition, the absolute pose of camera of pairing image and query image for matching with query image is all known 's.The absolute pose of the camera may include the pose for rotating the pose of the three degree of freedom of R and translating the three degree of freedom of t, The pose of totally 6 freedom degrees.For example, the absolute pose of the camera of query image can be expressed as R1, t1, the phase of image will be matched The absolute pose of machine is expressed as R2, t2.

The absolute pose of camera based on above-mentioned query image and pairing image, can also obtain the camera of this two images Absolute pose, the absolute pose of the camera of rotation and translation are expressed as Δ q and Δ t.The camera of two images of the acquisition is exhausted The markup information of relative pose is properly termed as to pose, in conjunction with the predictive information for the relative pose that subsequent step must be beaten, Foundation as network parameter adjustment.

In step 202, for any image pair, predict the camera between the query image and pairing image with respect to position Appearance.

For example, can use regression model, using the characteristics of image of query image and pairing image as input, pass through recurrence Export the camera relative pose between two images.The prediction camera relative pose that this step obtains is query image and pairing figure The predictive information of the absolute pose of the camera of picture.

In step 204, the absolute pose of camera based on the camera relative pose and pairing image, determines the inquiry The camera of image estimates pose.

In step 206, pose is estimated according to the camera of the query image, retrieves the new pairing figure of the query image Picture, the new pairing image and the query image constitute new images pair.

This step will estimate pose according to camera, retrieve the new pairing image for matching with query image again.Here Retrieval be the retrieval based on pose, estimate that pose goes in database retrieval and the estimation pose immediate one according to camera Image referred to as newly matches image.

In a step 208, the query image of the new images centering and the camera relative pose of new pairing image are predicted.This The absolute pose of camera obtained in step is the predictive information of the absolute pose of camera of query image and new pairing image.

In step 210, the difference between predictive information and markup information based on the absolute pose of the camera adjusts net Network parameter.

In this step, can based on query image and pairing image prediction camera relative pose and markup information and The predictive information and markup information of the absolute pose of camera of query image and new pairing image, it is absolute to respectively obtain respective camera The penalty values of pose, and network parameter is adjusted according to the penalty values.

For example, the prediction camera of query image and pairing image in examining large rope neural network forecast step 202 can be passed through Relative pose, and predictive information and markup information based on the prediction camera relative pose, obtain the damage of the absolute pose of camera Mistake value adjusts the network parameter of the examining large rope network.The loss function of foundation is as follows:

Wherein,Indicate the relative pose predictive information of translation,Indicate the relative pose predictive information of rotation.Δ_t Indicate the relative pose markup information of translation, Δ_qIndicate the relative pose markup information of rotation.As described above, markup information can To be calculated in step 200, and predictive information can be what prediction in step 202 obtained.Above-mentioned formula (1) makes It is L1 norm.

In another example the query image and the phase of new pairing image in sub-network prediction steps 208 can be returned by pose The absolute pose of machine, and predictive information and markup information based on the absolute pose of the camera, obtain the penalty values of relative pose, adjust The whole pose returns the network parameter of sub-network.The loss function of foundation is as follows:

Wherein,Indicate the relative pose predictive information (Relative translation) of translation,Indicate rotation Relative pose predictive information (Relative rotation).Δ′_tThe relative pose markup information of expression translation, Δ '_qIt indicates The relative pose markup information of rotation.As described above, markup information can be and in step 200 be calculated, and predict letter Breath can be what prediction in a step 208 obtained.Above-mentioned formula (2) and formula (1) are essentially identical, and difference is only that pose returns Return the image of sub-network and examining large rope network inputs to difference, examining large rope network inputs are query image and initial pairing Image, what pose returned sub-network input is query image and new pairing image, thus relative pose symbol indicate it is upper with Formula (1) is distinguished.

The training method of the present embodiment, by being carried out based on estimation pose The pose of retrieval, the new pairing image retrieved and query image is more nearly, and the lesser image of pose deviation is to being used to do The absolute pose prediction of camera will be more accurate, and then camera positioning result is also more acurrate.

Further, the process training of above-mentioned Fig. 2 has obtained examining large rope network and pose returns sub-network, can be by this Two sub-networks carry out camera positioning in conjunction with coarse search sub-network, according to the process of Fig. 1, wherein coarse search sub-network can be with The characteristics of image of query image is obtained using network used in existing " relative positioning " mode.It optionally, can also be to thick Retrieval sub-network also improves, so that coarse search sub-network is more accurate when exporting the characteristics of image of retrieval, initially The retrieval accuracy of pairing image also improves.

Based on this, Fig. 3 illustrates the training method of another camera positioning neural network, refers to Fig. 3, wherein the mind It include three trained branches through network, as follows:

ICR (Image-based Coarse Retrieval Module) module is also referred to as coarse search sub-network, is used for base It retrieves to obtain initial pairing image in characteristics of image.

PFR (Pose-based Fine Retrieval Module) module is also referred to as examining large rope network, for being based on Pose is retrieved to obtain new pairing image.

PRP (Precise Relative Pose Regression module) module is also referred to as pose and returns sub-network, It is returned for carrying out accurately relative pose based on new pairing image and query image, obtains newly matching image and query image The absolute pose of camera.

In the method for the present embodiment, ICR module, PFR module and PFP module are all trained, also, also passes through A variety of loss functions are used in combination, so that the accuracy of the image retrieval of ICR module is also improved.In addition, in the training mind It is to directly adopt image to being trained, so the ICR module in training process does not need to retrieve pairing figure again when through network Picture；Also, in the training stage, ICR and PFR module is parallel training, and the two modules are successive during network application It executes, such as first according to the initial pairing image of ICR module retrieval and inquisition image, then the initial pairing is predicted by PFR module Camera relative pose between image and query image.

Above-mentioned ICR module and PFR module is the retrieving portion for retrieving pairing image, passes through the inspection of retrieving portion Rope, gets the pairing image for matching with query image, the training method that the disclosure provides, will by ICR module and The training of PFR module enables and gets more accurate match by these modules in the practical stage of neural network To image, so that pairing image and the pose of query image are as close as to improve relative pose regression effect.PRP mould Search result based on retrieving portion is returned and obtains the absolute pose of camera between pairing image and query image by block.

In the training process, prepares in advance and be used on trained " image to ", and do not need to retrieve after ICR processing. Each image pair includes two images, and every image therein all marks the absolute position of camera of both rotation and translation Appearance.It is dug in addition, using visual angle for the training of ICR module in the present embodiment and being overlapped loss, angle offset loss and difficult sample Pick loss.Therefore, visual angle registration markup information, the angle offset markup information of each image pair are also calculated.Visual angle weight It is right and angle offset predictive information is subsequent can directly predict to obtain by ICR module.

Wherein, the markup information of visual angle registration can calculate as follows:

Camera internal reference is K, and the pixel coordinate of two images of image pair is respectively X1 and X2, the depth of this two images Information is D1 and D2.

Wherein, f is the focal length of camera, and (px, py) is the coordinate of image center point.

For two images of image pair, if the pixel of first image first projected in three-dimensional space, then throw Shadow into second image, then, certain pixel in first image is in pair after above-mentioned projection in second image Answer location of pixels that can be expressed as follows:

The ratio in pixel coverage that the pixel for calculating first image has fallen in second image after projection With referred to as visual angle registration d.When being overlapped loss training ICR module using visual angle, two figures of above-mentioned any image centering As all there is visual angle coincidence.

In the present embodiment, for any image pair, if by two images of the image pair wherein one be known as " the One image ", another is known as " second image ".The markup information of calculating includes: that first image projection is schemed to second The visual angle registration of picture, is expressed as d1；Further include: second image projection is expressed as to the visual angle registration of first image d2。

Fig. 4 illustrates the visual angle registration of two images, wherein the top line in Fig. 4 indicates four groups of images pair, often One group of image is to including two images, and the numerical value of the visual angle registration of two images is 0.76,0.22,0.12 and respectively 0.42, when which is an image projection to another image, pixel falls in the ratio in another image range.This ratio Example is also the registration between two cones illustrated in Fig. 4.

The markup information of angle offset can calculate as follows:

For any image pair, two images therein correspond to two camera poses, the two camera poses it is opposite Angle offset can be calculated according to following formula:

It is ready for after trained image pair and above-mentioned markup information, the instruction of the neural network is illustrated in conjunction with Fig. 3 Practice process:

Wherein, the Anchor image in Fig. 3 is properly termed as query image when training, which is to train Changeless image in journey, and matching image is variation, such as by matching image update initially as new pairing image. Target images (Coarse) is the originally determined image for matching with query image, can also be known as the time being initial Match image.Each query image can have multiple initial pairing images.In addition, Shared encoder is three training The shared coding module of branch, for input the image of this three trained branches, all using the identical encoder module into Row processing, so three Shared encoder shown in Fig. 3 can consider network parameter having the same.

As shown in figure 3, all passing through Shared encoder for query image and multiple corresponding initial pairing images After the processing of module, ICR module and PFR module are inputted.It should be noted that the query image and each initial pairing image The image pair of composition had both inputted ICR module, also inputted PFR module, that is to say, that the input of ICR module and PFR module is phase With, it is all the image pair of query image and initial pairing image composition, also, the quantity of image pair can be satisfaction training It is required that multiple groups.

ICR module and PFR module are two trained branches of parallel training, and following we illustrate the two modules respectively Training:

For ICR module, the present embodiment is with three loss function training, comprising: visual angle is overlapped loss function, angle offset Loss function and difficult sample excavate loss function.ICR module includes two parts in the training stage, and a part is decoding network part ICR decoder, another part are Recurrent networks part Regressor.Decoding network part therein be still to image into Row feature extraction, obtains the characteristics of image for retrieval, and certain training stage ICR module is not related to retrieving.It needs to illustrate Be, ICR module the training stage include above-mentioned two part ICR decoder (being properly termed as decoding network part) and Regressor (is properly termed as Recurrent networks part), but the practical stage of neural network after the completion of training, ICR Module can only include ICR decoder.Recurrent networks part Regressor is used for straight according to characteristics of image in the training stage It connects prediction and obtains images relations parameter required for loss function calculates: the predictive information and relative angle of visual angle registration The predictive information of offset.And difficult sample excavates the calculating of the penalty values of loss function, is the image exported according to ICR decoder Feature calculation, can be without the processing of Recurrent networks part Regressor, which excavates the loss of loss function Value is referred to as images relations parameter.

For example, the image of query image and initial pairing image composition is to all after the processing of decoding network part, it can To respectively obtain the characteristics of image of query image and the characteristics of image of initial pairing image.The image of the query image is special After the initial characteristics of image for matching image of seeking peace is spliced (concat), Recurrent networks part Regressor is inputted.The ICR The Recurrent networks part of module can export the predictive information of visual angle registration and the predictive information of relative angular offset.And according to The predictive information and the markup information obtained before calculate penalty values:

Wherein, formula (7) is lost using L2 norm calculation visual angle registration, and d1 is first image projection to second The markup information of the visual angle registration of image, d2 are second image projections to the mark letter of the visual angle registration of first image Breath.First image projection to second image visual angle registration predictive information,It is second image projection To the predictive information of the visual angle registration of first image.

Wherein, formula (8) is lost using L2 norm calculation angle offset, and α is that image pair two opens phase between image To the markup information of angle offset,It is the predictive information for the relative angular offset that image pair two is opened between image.

As above, for the multiple series of images pair of input ICR module, each group of image is to the angle that can calculate two images Degree offset loss and the loss of visual angle registration.Wherein, the visual angle registration obtained above by Recurrent networks fractional prediction The image that predictive information and the predictive information of angle offset (can also be relative angular offset amount) are properly termed as two images closes It is parameter.

The calculating that loss is excavated for difficult sample, can be will be used for trained multiple series of images to being grouped.Grouping can Being grouped according to the complexity for returning relative pose, for example, if the visual angle of two images of one group of image pair Registration is higher, and the absolute pose of easier progress camera returns, and here easier refers to that relative pose returns accurate It spends just higher；If the difference degree of the absolute pose of the camera of two images is smaller, easier progress relative pose recurrence.According to Above-mentioned criterion can be used for trained multiple series of images to being divided into tri- groups of easy, moderate and hard for above-mentioned, easy therein Group image to can easier progress relative pose recurrence, the visual angle registration of these images pair is higher, absolute pose Difference degree is smaller；The recurrence of other two groups relative poses can be increasingly difficult to.

Continuing with shown in Figure 3, two images of each group of image pair, can be with after the processing of ICR decoder Obtain the characteristics of image of every image.It on the basis of being divided into tri- groups of easy, moderate and hard, can also will be instructed above-mentioned The image for practicing collection includes N query images to multiple batch, each batch setting is divided into, then, for each of these Query image, can be by randomly selecting the initial of the query image in tri- groups of above-mentioned easy, moderate and hard respectively Match image.For example, P-p1 is one group of image pair, also, the image is to the position " P-p1 " by taking a wherein query image P as an example In easy group；P-p2 is another group of image pair, which is located in moderate group " P-p2 "；P-p3 is another group of image Right, which is located in hard group " P-p3 "；P1, p2, the p3 are three initial pairing figures of corresponding same query image P Picture.And above-mentioned P-p1, P-p2, P-p3 image is to belonging to same batch；Other query images in batch are also according to upper It states identical mode and extracts initial pairing image in different grouping.The sampling teaming method of this batch is properly termed as "Batch hard sampling".Optionally, tri- groups of easy, moderate and hard of division can be instructs in initial construction Practice collection Shi Zhihang, and " Batch hard sampling " here forms batch and is also possible in image to by ICR It is executed after decoder.

Based on the batch of above-mentioned establishment, difficult sample can be calculated and excavate loss, which excavates loss for indicating not With the relationship between the arbitrary image characteristic distance in grouping:

Formula (9) as above, L_tripletIt is that difficult sample excavates loss,Respectively represent i-th of anchor figure Picture easy corresponding with its, moderate, the characteristics of image of hard image, which is exactly the ICR decoder in Fig. 3 Obtained characteristics of image.Wherein, β is the threshold value of characteristics of image distance, and (z) +=max (z, 0) is a maximization function (maximum function)。

When excavating the network parameter of loss adjustment ICR module according to the hardly possible sample, the purpose of adjustment is desirable to any two The distance of the image comparison moderate image pair of a easy is close, the distance of any moderate image comparison hard image pair Closely.This characteristics of image distance for obtaining each image pair in different grouping (easy, moderate, hard) respectively, and according to this The characteristics of image distance of each image pair obtains the predicted value that difficult sample excavates loss in different grouping, which excavates loss table Show and be referred to as obtaining query image and match the images relations parameter between image, this images relations parameter is a kind of image Relationship of the centering characteristics of image between is considered, for example, the distance of the image comparison moderate image pair of easy is close.According to According to the meaning of formula (9), when model training is completed, above-mentioned difficult sample excavates penalty values L_tripletIt is 0, in the training of model It in the process, can be by adjusting network parameter, so that the hardly possible sample excavates penalty values and levels off to 0.

As above, in the present embodiment, coarse search sub-network (ICR module) has used three kinds of loss functions, visual angle registration damage It loses function, angle offset loss function and difficult sample and excavates loss function.In actual implementation, more multiclass is can be used in ICR module Lesser amount of loss function also can be used in the loss function of type, is not limited to these three losses.For example, can only calculate Registration loss in visual angle is for adjusting ICR network parameter, alternatively, the loss of visual angle registration and angle offset damage can also be calculated It loses.No longer lift in detail.

Followed by the calculating for the loss function for seeing PFR module: Fig. 3 is referred to, for trained multiple series of images in PFR mould The division that block can not have to carry out the different groupings such as easy, hard obtains image after the processing for directly passing through PFR decoder To the characteristics of image of two images.For any image pair, the characteristics of image which opens image is spliced (concat) after, the Recurrent networks part Regressor of PFR network is inputted, so that it may which directly output is for the image pair two Open the predictive information of the absolute pose of camera of image.The predictive information of the absolute pose of the camera includes the relative pose prediction of translation InformationWith the relative pose predictive information of rotationMarkup information is calculated in aforementioned step 200, according to Formula (1) calculates the penalty values of the relative pose of PFR module.

Continuing with referring to Fig. 3, the relative pose predictive information of PFR module prediction outputWithIt can be used for pair Target images (Coarse) is updated, that is, updates the pairing image of query image.For example, can be pre- according to PFR module The absolute pose of the camera measured and image pair initially match the absolute pose of camera of image, obtain the estimation position of query image Appearance.It, can be by retrieval in database and the immediate image of estimation pose, as the new of query image based on the estimation pose Match image.It should be noted that may include large number of many images in " database ", the multiple groups obtained before training Image pair, can be it is random by selecting some images to be matched with query image in the database, to form image pair Training for ICR and PFR module.And this is in after PFR module predicts the absolute pose of camera and is retrieved (Pose- again Based retrieval), it is used to match with query image by retrieving some new images in above-mentioned database, these new pairings Image can be different with initial pairing image before.

New pairing image (Target images (Fine)) is to retrieve to obtain based on pose, compared to initial pairing image (Target images (Coarse)), doing relative pose recurrence with query image will be more accurate.Continuing with referring to Fig. 3, newly match To the image of image and query image to input PRP module, image pair every figure is obtained after the processing of PRP decoder The characteristics of image of picture, after the characteristics of image of a pair of of image pair is spliced (concat), recurrence Regressor by PRP module, The directly absolute pose of camera of prediction output query image and new pairing image, the absolute pose of camera and rotation including translation The absolute pose of camera.The penalty values of relative pose are calculated further according to formula (2).

The calculating of the loss function of tri- network branches of ICR, PFR and PFP as detailed above and respective network Structure, there it can be seen that three branching networks have a shared coder module Shared encoder, and each branched network The network parameter of network wants each self-adjusting.In adjusting parameter, can be according to above-mentioned visual angle registration loss, angle offset damage The adjustment of loss, relative pose loss progress network parameter is excavated in mistake, difficult sample, as follows, defines whole loss function are as follows:

L=L_frustum+L_angle+L_triplet+L_PFR+L_PRP......(10)

It can be adjusted in the training process by successive ignition and parameter, until above-mentioned various penalty values reach scheduled Numberical range or when reaching preset the number of iterations, terminates training.

Camera positioning is carried out using camera positioning neural network

Include ICR decoder in ICR module in the neural network that training is completed, and no longer includes being returned shown in Fig. 3 Return network portion.For the Application of Neural Network that training is completed when camera positions, process principle can combine showing referring to Fig. 1 Meaning.

In addition, application the neural network before, can be used training completion network in Shared encoder and ICR decoder is to every image zooming-out characteristics of image in database.In the database, every image can mark it The characteristics of image of the absolute pose of camera and said extracted, the characteristics of image are used for retrieving.

Fig. 5 is a kind of flow chart for camera localization method that the disclosure provides, as shown in figure 5, this method may include:

In step 500, the query image of a pending camera positioning is received.

In this step, the query image can be the image of pending camera positioning.

In step 502, according to the characteristics of image of the query image, retrieval obtains the initial pairing of the query image Image.

In this step, by query image after the encoding and decoding of encoder-decoder processing, characteristics of image is obtained.Root It obtains initially matching image by retrieving in database according to the characteristics of image, this initially matches image and query image forms image It is right.

In step 504, the prediction relative pose based on initial the pairing image and query image, obtains the inquiry The camera of image estimates pose.

In this step, by the image of initial pairing image and query image composition to input PFR module, prediction obtain this two Open the absolute pose of camera of image.Also, the camera according to the absolute pose of the camera of the prediction and initial pairing image is absolute Pose obtains the camera estimation pose of query image.

In step 506, pose is estimated according to the camera of the query image, retrieval obtains newly matching for the query image To image.

In this step, according to camera estimate pose by database retrieve obtain newly matching image, the new pairing image and Query image forms new images pair.

In step 508, the camera relative pose of new the pairing image and query image is predicted.

This step can be by PFP module according to above-mentioned new images pair, and prediction obtains newly matching image and query image Camera relative pose.

In step 510, camera relative pose and the new pairing based on new the pairing image and query image The absolute pose of the camera of image obtains the absolute pose of camera of the query image.

The camera localization method of the present embodiment, by based on characteristics of image retrieval pairing image after, and execute basis The pose of the Relative attitude and displacement estimation query image of prediction carries out once based on the retrieval of pose, updated new images centering Two images have the pose being more nearly, so that the effect that relative pose returns is more preferable, the camera positioning result of query image Also more acurrate.

Fig. 6, which is illustrated, obtains the Contrast on effect of positioning result using the camera localization method that the disclosure provides, such as Fig. 6 institute Show, green line therein indicates true camera track, and red line is the prediction camera track that model prediction obtains.Pass through PoseNet, MapNet++ etc. different models carries out camera positioning, it can be found that the camera that disclosed method (Ours) obtains Track and real trace are the most close to the camera locating effect of i.e. method of disclosure is more preferable, and positioning result is more acurrate.

The neural network for camera positioning that disclosure training obtains, can be applied to several scenes, for example, map is led Boat, the positioning in automated driving system or robot navigation etc. are the figures that can be shot according to camera no matter which kind of scene As carrying out camera positioning, to realize the positioning to the equipment where camera.

Fig. 7 is a kind of schematic diagram for camera positioning device that the disclosure provides, the apparatus may include: initial retrieval module 71, initial predicted module 72, again retrieval module 73, again prediction module 74 and positioning determining module 75.

Initial retrieval module 71 obtains the initial pairing image of query image for retrieving in image data base；It is described Known to the absolute pose of the corresponding camera of image in image data base；

Initial predicted module 72, for obtaining the prediction camera relative pose of initial the pairing image and query image； According to the prediction camera relative pose, the camera estimation pose of the query image is determined；

Again it is examined for estimating pose according to the camera of the query image by described image database retrieval module 73 Rope obtains the new pairing image of the query image；

Again prediction module 74, for predicting the camera relative pose of new the pairing image and query image；

Determining module 75 is positioned, for camera relative pose, Yi Jisuo based on new the pairing image and query image The absolute pose of camera for stating new pairing image, determines the absolute pose of the camera of the query image.

In one example, as shown in figure 8, the device can also include: the first image collection module 76, in image Before retrieval obtains the initial pairing image of query image in database, known geographic database, the known geographic number are obtained It include multiple images known to the absolute pose of camera according to library；It is corresponding by choosing predetermined geographic in the known geographic database Image, construct described image database.

In one example, which can also include: the second image collection module 77, for examining in image data base Before rope obtains the initial pairing image of query image, by being provided with the acquisition intelligent terminal of acquisition camera, in predetermined geographic Region acquires multiple images；Determine the absolute pose of the corresponding camera of each image of acquisition；Multiple figures according to acquisition Picture and its absolute pose of camera, construct the corresponding image data base of the predetermined geographic.

In one example, the corresponding predetermined geographic of each image of described image database, is following any kind Region: digital map navigation region, intelligent driving localization region or robot navigation region.

In one example, the initial retrieval module 71, is specifically used for: receiving the inquiry of pending camera positioning Image；Extract the characteristics of image of the query image；According to the characteristics of image of the query image, by described image database Retrieval obtains the initial pairing image of the query image.

In one example, the camera positioning device includes camera positioning neural network；The camera positions nerve net Network includes: that shared sub-network, coarse search sub-network, examining large rope network and relative pose return sub-network；The shared subnet Network returns sub-network with the coarse search sub-network, examining large rope network and relative pose respectively and connect；

The shared sub-network, for dividing to the query image, the initial pairing image and the new pairing image The processing for not extracting characteristics of image respectively obtains shared treated image；

The initial retrieval module 71, for being also used to: right before retrieving and obtaining the initial pairing image of query image Shared treated processing of the query image Jing Guo coarse search sub-network, obtains the characteristics of image of the query image, with Image is initially matched according to described image characteristic key；

The initial predicted module 72, opposite with the prediction camera of query image for obtaining the initial pairing image When pose, comprising: pass through the place of the examining large rope network to shared treated the initial pairing image and query image Reason, by the prediction camera relative pose of both examining large rope network output；

Again prediction module 73, for predicting the camera relative pose of new the pairing image and query image When, comprising: to shared treated the new pairing image and the query image by relative pose recurrence sub-network Processing, by the relative pose return sub-network output both camera relative pose.

In one example, it includes decoding network part and Recurrent networks part that the relative pose, which returns sub-network,；Institute Prediction module 73 again are stated, are specifically used for: to the image pair of shared treated new the pairing image and query image, input The relative pose returns sub-network, after the relative pose returns the processing of the decoding network part in sub-network, obtains To the characteristics of image of every image of image pair；The characteristics of image of the new pairing image and query image is spliced, is spelled Connect feature；The splicing feature is after the relative pose returns the processing of the Recurrent networks part of sub-network, output prediction Query image and new pairing image camera relative pose.

In one example, the examining large rope network includes decoding network part and Recurrent networks part；It is described initial Prediction module 72, is specifically used for: to the image pair of shared treated initial the pairing image and query image, described in input Examining large rope network obtains image pair every figure after the processing of the decoding network part in the examining large rope network The characteristics of image of picture；The characteristics of image of the initial pairing image and query image is spliced, splicing feature is obtained；The splicing Feature exports the query image of prediction and initially matches image after the processing of the Recurrent networks part of examining large rope network Camera relative pose.

In one example, the initial retrieval module 71, is also used to: in shared treated the query image By the processing of coarse search sub-network, before obtaining the characteristics of image of the query image, mind is positioned using the camera of pre-training Through the shared sub-network and coarse search sub-network in network, to every image zooming-out characteristics of image in described image database； To every image labeling described image feature, so that carrying out image retrieval according to described image feature.

In one example, the retrieval in image data base obtains the initial pairing image of the query image, wraps Include: retrieval obtains multiple initial pairing images of the query image, multiple to be obtained according to the multiple initial pairing image The new pairing image；

The positioning determining module 75, is specifically used for: respectively according to each new pairing of the multiple new pairing image Image obtains the absolute pose of camera of query image, and obtains the phase of the query image according to multiple absolute poses of camera The absolute pose of machine.

Fig. 9 provides a kind of training device of camera positioning neural network, as shown in figure 9, the apparatus may include: image obtains Obtain module 91, opposite prediction module 92, estimation pose module 93, new images module 94, pose prediction module 95 and parameter adjustment Module 96.

Image obtains module 91, and for obtaining multiple series of images pair, each group of image is to including a query image and one Image is matched, and the pairing image and query image are respectively provided with the corresponding absolute pose of camera, described image is to also having The markup information of relative pose；

Opposite prediction module 92 predicts the query graph for positioning neural network for any image pair by camera Camera relative pose between picture and pairing image；

Pose module 93 is estimated, described in determining based on the relative pose and the absolute pose of camera for matching image The camera of query image estimates pose；

New images module 94, for estimating pose according to the camera of the query image, by retrieving institute in image data base The new pairing image of query image is stated, the new pairing image and query image constitute new images pair；

Pose prediction module 95, for by camera position neural network prediction described in new images centering query image with The camera relative pose of new pairing image；

Parameter adjustment module 96, for the difference between predictive information and markup information based on the camera relative pose It is different, adjust the network parameter of the camera positioning neural network.

In one example, as shown in Figure 10, camera positioning neural network include: shared sub-network, coarse search sub-network, Examining large rope network and relative pose return sub-network；The shared sub-network respectively with the coarse search sub-network, examining rope Sub-network returns sub-network with relative pose and connects；The shared sub-network, for the query image, the initial pairing Image and the new pairing image, first pass through the processing that the shared sub-network extracts characteristics of image, respectively obtain altogether Image of enjoying that treated；Described device further include: initial retrieval module 97, for shared treated the query image and Processing of the image Jing Guo coarse search sub-network is matched, the query image is obtained and matches the images relations parameter between image； The opposite prediction module 92, is specifically used for: passing through examining large rope to shared treated the pairing image and query image The processing of network, by the relative pose predictive information of both examining large rope network output；The pose prediction module 95, tool Body is used for: shared treated the new pairing image and the query image return the place of sub-network by the relative pose Reason is returned the relative pose predictive information of both sub-network output by the relative pose；The parameter adjustment module 96, with When adjusting the network parameter of camera positioning neural network, comprising: according to the predictive information of described image Relation Parameters and Difference, the Yi Jisuo of the difference of markup information, the relative pose predictive information of examining large rope network output and markup information It states relative pose and returns the relative pose predictive information of sub-network output and the difference of markup information, adjust the shared subnet Network, coarse search sub-network, examining large rope network and relative pose return the network parameter of sub-network.

In one example, the initial retrieval module 97, for obtaining between the query image and pairing image Images relations parameter when, comprising: according to the query image of one group of image pair and pairing image rotation pose, determine prediction The query image and pairing image camera pose relative angular offset amount as described image Relation Parameters.

In one example, the initial retrieval module 97, for obtaining between the query image and pairing image Images relations parameter when, comprising: by the corresponding multiple images of same query image to according to return relative pose difficulty or ease journey Degree is grouped；The characteristics of image distance of each image pair in different grouping is obtained respectively；According to described image characteristic distance, obtain Difficult sample excavates the predicted value of loss, and the hardly possible sample excavates loss for indicating the arbitrary image characteristic distance in different grouping Between relationship.

The embodiment of the present disclosure additionally provides a kind of electronic equipment, and the equipment includes memory, processor, the memory For storing the computer instruction that can be run on a processor, the processor when executing the computer instruction for realizing Any camera localization method of the embodiment of the present disclosure, or realize that any camera of the embodiment of the present disclosure positions nerve The training method of network.

It will be understood by those skilled in the art that disclosure one or more embodiment can provide as method, system or computer Program product.Therefore, complete hardware embodiment, complete software embodiment or combination can be used in disclosure one or more embodiment The form of embodiment in terms of software and hardware.Moreover, disclosure one or more embodiment can be used it is one or more its In include computer usable program code computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, Optical memory etc.) on the form of computer program product implemented.

The embodiment of the present disclosure also provides a kind of computer readable storage medium, can store computer on the storage medium Program realizes any camera localization method of the embodiment of the present disclosure, or realizes this when described program is executed by processor The training method of the open any camera positioning neural network of embodiment.

Various embodiments are described in a progressive manner in the disclosure, same and similar part between each embodiment It may refer to each other, each embodiment focuses on the differences from other embodiments.Especially for data processing For apparatus embodiments, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to method The part of embodiment illustrates.

It is above-mentioned that disclosure specific embodiment is described.Other embodiments are within the scope of the appended claims. In some cases, the behavior recorded in detail in the claims or step can be executed according to the sequence being different from embodiment And desired result still may be implemented.In addition, process depicted in the drawing not necessarily require the particular order shown or Person's consecutive order is just able to achieve desired result.In some embodiments, multitasking and parallel processing are also possible Or it may be advantageous.

The embodiment of theme and feature operation described in the disclosure can be realized in the following: Fundamental Digital Circuit has The existing computer software of body or firmware, the computer hardware including structure disclosed in the disclosure and its structural equivalents, Or the combination of one or more of which.The embodiment of theme described in the disclosure can be implemented as one or more meters Calculation machine program, i.e. coding are on tangible non-transitory program carrier to execute or control data processing equipment by data processing equipment Operation computer program instructions in one or more modules.Alternatively, or in addition, program instruction can be encoded On manually generated transmitting signal, such as electricity, light or electromagnetic signal that machine generates, the signal are generated to encode information onto And suitable receiver apparatus is transferred to be executed by data processing equipment.Computer storage medium can be machine readable storage Equipment, machine readable storage substrate, random or serial access memory equipment or one or more of which combination.

Processing and logic flow described in the disclosure can be by one or more of the one or more computer programs of execution A programmable calculator executes, to execute corresponding function by the way that output is operated and generated according to input data.It is described Processing and logic flow can also be by dedicated logic circuit-such as FPG more (field programmable gate arrays) or more SIC (dedicated collection At circuit) Lai Zhihang, and device also can be implemented as dedicated logic circuit.

The computer for being suitable for carrying out computer program includes, for example, general and/or special microprocessor or it is any its The central processing unit of his type.In general, central processing unit will refer to from read-only memory and/or random access memory reception Order and data.The basic module of computer includes central processing unit for being practiced or carried out instruction and for storing instruction With one or more memory devices of data.In general, computer will also be including one or more great Rong for storing data Amount storage equipment, such as disk, magneto-optic disk or CD etc. or computer will be coupled operationally with this mass-memory unit To receive from it data or have both at the same time to its transmission data or two kinds of situations.However, computer is not required to have in this way Equipment.In addition, computer can be embedded in another equipment, such as mobile phone, personal digital assistant (PD is more), mobile sound Frequency or video player, game console, global positioning system (GPS) receiver or such as universal serial bus (USB) flash memory The portable memory apparatus of driver, names just a few.

It is suitable for storing computer program instructions and the computer-readable medium of data including the non-volatile of form of ownership Memory, medium and memory devices, for example including semiconductor memory devices (such as EPROM, EEPROM and flash memory device), Disk (such as internal hard drive or removable disk), magneto-optic disk and CD ROM and DVD-ROM disk.Processor and memory can be by special It is supplemented or is incorporated in dedicated logic circuit with logic circuit.

Although the disclosure includes many specific implementation details, these are not necessarily to be construed as limiting any scope of disclosure Or range claimed, and be primarily used for describing the feature of specifically disclosed specific embodiment.Multiple in the disclosure Certain features described in embodiment can also be combined implementation in a single embodiment.On the other hand, in a single embodiment The various features of description can also be performed separately in various embodiments or be implemented with any suitable sub-portfolio.Though in addition, Right feature can work in certain combinations as described above and even initially so be claimed, but come from required guarantor One or more features in the combination of shield can be removed from the combination in some cases, and combination claimed The modification of sub-portfolio or sub-portfolio can be directed toward.

Similarly, although depicting operation in the accompanying drawings with particular order, this is understood not to require these behaviour Make the particular order shown in execute or sequentially carry out or require the operation of all illustrations to be performed, to realize desired knot Fruit.In some cases, multitask and parallel processing may be advantageous.In addition, the various system modules in above-described embodiment Separation with component is understood not to be required to such separation in all embodiments, and it is to be understood that described Program assembly and system can be usually integrated in together in single software product, or be packaged into multiple software product.

The specific embodiment of theme has been described as a result,.Other embodiments are within the scope of the appended claims.In In some cases, the movement recorded in claims can be executed in different order and still realize desired result.This Outside, the processing described in attached drawing and it is nonessential shown in particular order or sequential order, to realize desired result.In certain realities In existing, multitask and parallel processing be may be advantageous.

The foregoing is merely the preferred embodiments of disclosure one or more embodiment, not to limit the disclosure One or more embodiments, all any modifications within the spirit and principle of disclosure one or more embodiment, made, etc. With replacement, improvement etc., should be included within the scope of the protection of disclosure one or more embodiment.

31页详细技术资料下载

The training method and device of camera positioning and neural network

相关技术

网友询问留言