Method, device, computer program product and computer program for searching images by images

文档序号：1921646 发布日期：2021-12-03 浏览：9次中文

阅读说明：本技术 以图搜图方法、装置、计算机程序产品和计算机程序 (Method, device, computer program product and computer program for searching images by images ) 是由徐剑炯葛俊毛云青李开民于 2021-09-13 设计创作，主要内容包括：本申请提出了一种以图搜图方法、装置、计算机程序产品和计算机程序,包括以下步骤：获取待搜索图像；获取历史图像集合中与所述待搜索图像相似度高的至少一第一相似图像,组成第一相似图像集合；获取所述待搜索图像的局部图像,将所述局部图像和所述第一相似图像集合输入局部特征提取网络中提取每一所述第一相似图像的匹配特征关键点；依据每一所述第一相似图像的所述匹配特征关键点的数量确定至少一第二相似图像,通过全局搜索和局部搜索联合的方式实现相似图像的确认。(The application provides a method, a device, a computer program product and a computer program for searching images by images, which comprise the following steps: acquiring an image to be searched; acquiring at least one first similar image with high similarity to the image to be searched in the historical image set to form a first similar image set; acquiring a local image of the image to be searched, inputting the local image and the first similar image set into a local feature extraction network to extract a matching feature key point of each first similar image; and determining at least one second similar image according to the number of the matched feature key points of each first similar image, and realizing the confirmation of the similar images in a global search and local search combined mode.)

1. A method for searching a picture by a picture is characterized by comprising the following steps:

acquiring an image to be searched;

acquiring at least one first similar image with high similarity to the image to be searched in the historical image set to form a first similar image set;

acquiring a local image of the image to be searched, inputting the local image and the first similar image set into a local feature extraction network to extract a matching feature key point of each first similar image;

determining at least one second similar image according to the number of the matched feature key points of each first similar image.

2. The method of claim 1, wherein inputting the local image and the first similar image set into a local feature extraction network to extract the matching feature key points of each of the first similar images comprises: acquiring a first matching point of the local image and the first similar image, extracting a local image area feature map of the local image according to the position of the first matching point, extracting a similar image area feature map of the first similar image according to the position of the first matching point, and acquiring matching feature key points of the local image area feature map and the similar image area feature map.

3. The image searching method according to claim 2, wherein the obtaining of the first matching point of the local image and the first similar image comprises: extracting a first local image feature map vector of the local image, extracting a first similar image feature map vector of the first similar image, inputting the first local image feature map vector and the first similar image feature map vector into an attention and cross attention module for double attention weight weighting, respectively outputting a second local image feature map vector and a second similar image feature map vector, obtaining a score matrix for matching the second similar image feature map vector and the second local image feature map vector, and obtaining the first matching point based on the score matrix.

4. The image searching method of claim 3, wherein the self-attention and cross-attention module uses a relationship between the first local image feature map vector and the first similar feature map vector, updates the first similar image feature map vector based on a weight update to obtain the second similar image feature map vector, and updates the first local image feature map vector to obtain the second local image feature map vector.

5. The image searching method according to claim 3, wherein the step of obtaining the matching feature key points of the local image region feature map and the similar image region feature map comprises: inputting a local region feature map vector corresponding to the local region feature map and a similar image region feature map vector corresponding to the similar image region feature map into a self-attention and cross-attention module for double attention weight weighting, respectively outputting a third local image feature map vector and a third similar image feature map vector, calculating the correlation between the third local image feature map vector and the third similar image feature map vector, and screening points with high correlation as the matching feature key points.

6. The method as claimed in claim 5, wherein the step of calculating the correlation between the third local image feature map vector and the third similar image feature map vector and screening the points with high correlation as the matching feature key points comprises: and calculating response matrix diagrams of the third local image feature diagram vector and the third similar image feature diagram vector, acquiring response values of feature points based on the response matrix diagrams, and selecting the feature points with high response values as matching feature key points.

7. The image searching method according to claim 1, wherein the step of obtaining at least one first similar image with a high similarity to the image to be searched in the historical image set and forming a first similar image set comprises: acquiring a first characteristic vector of the image to be searched; acquiring a second feature vector of each historical image of the historical image set; and comparing the similarity of the first feature vector and the second feature vector of each historical image, and acquiring the historical image corresponding to at least one second feature vector with high similarity as the first similar image.

8. The image searching method according to claim 7, wherein "obtaining a first feature vector of the image to be searched; obtaining a second feature vector for each historical image of the set of historical images "comprises: inputting the image to be searched into a global feature extraction network to obtain the first feature vector; and inputting each historical image of the historical image set into a global feature extraction network to obtain the second feature vector, wherein a loss function of the global feature extraction network during training is replaced by a hyperbolic tangent function.

9. The image searching method according to claim 1, wherein the acquiring the local image of the image to be searched comprises: and intercepting a local area image of the image to be searched, and filling the size of the local area image to the size of the image to be searched to obtain the local image.

10. An apparatus for searching a picture using a picture, comprising:

the image acquisition unit is used for acquiring an image to be searched;

the global feature searching unit is used for acquiring at least one first similar image with high similarity to the image to be searched in the historical image set to form a first similar image set;

the local feature searching unit is used for acquiring a local image of the image to be searched, inputting the local image and the first similar image set into a local feature extraction network and extracting a matching feature key point of each first similar image;

and the result acquisition unit is used for determining at least one second similar image according to the number of the matched feature key points of each first similar image.

11. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for searching images according to any one of claims 1 to 9.

12. A computer program product, characterized in that it comprises software code portions for performing the graph searching method according to any one of claims 1 to 9 when the computer program product is run on a computer.

13. A readable storage medium, in which a computer program is stored, the computer program comprising program code for controlling a process to execute a process, the process comprising the method of searching a graph according to any one of claims 1 to 9.

Technical Field

The present application relates to the field of big data mining, and in particular, to a method, an apparatus, a computer program product, and a computer program for searching a graph with a graph.

Background

The image searching is a common image searching implementation strategy, a user only needs to input an image to be searched, and the corresponding algorithm can quickly search image data similar to the image to be searched in a historical image database. For example, in the construction process of a smart city, a large number of city management images are collected, and such city management images are stored in a historical image database, so that an image similar to a specific scene or event can be searched in a large number of historical image data by using an image searching technology.

Most of the current image searching algorithms adopt a mode of extracting global image features of images, and measure of distances between the images and the global image features is used as a sequencing basis of similar images, so that the similar images are provided according to the similarity from high to low. However, this algorithm only considers global image features and ignores the information according to representative local images, which directly results in that the results of the search are not accurate enough. In addition, such an algorithm cannot adjust the wrongly sorted history images according to the sorting priority, resulting in a result that is not accurate enough.

Disclosure of Invention

The embodiment of the application provides a method, a device, a computer program product and a computer program for searching images, wherein an image initially screened with high similarity is pre-screened in a global search mode, local features of the image initially screened are matched in a local search mode, so that more similar images are confirmed, and the search accuracy is improved in a combined mode of global search and local search.

In a first aspect, an embodiment of the present application provides a method for searching a graph with a graph, including the following steps: acquiring an image to be searched; acquiring at least one first similar image with high similarity to the image to be searched in the historical image set to form a first similar image set; acquiring a local image of the image to be searched, inputting the local image and the first similar image set into a local feature extraction network to extract a matching feature key point of each first similar image; determining at least one second similar image according to the number of the matched feature key points of each first similar image.

In a second aspect, an embodiment of the present application provides an apparatus for searching a graph with a graph, including: the image acquisition unit is used for acquiring an image to be searched; the global feature searching unit is used for acquiring at least one first similar image with high similarity to the image to be searched in the historical image set to form a first similar image set; the local feature searching unit is used for acquiring a local image of the image to be searched, inputting the local image and the first similar image set into a local feature extraction network and extracting a matching feature key point of each first similar image; and the result acquisition unit is used for determining a second similar image according to the number of the matched feature key points of each first similar image.

In a third aspect, an embodiment of the present application provides an electronic apparatus, which includes a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for searching a graph according to the present solution.

In a fourth aspect, the present application provides a computer program product, which includes a software code portion for performing the method for searching a graph with a graph according to the present scheme when the computer program product is run on a computer.

In a fifth aspect, an embodiment of the present application provides a readable storage medium, where a computer program is stored in the readable storage medium, where the computer program includes program code for controlling a process to execute a process, where the process includes the method for searching a graph according to the present solution.

The main contributions and innovation points of the invention are as follows: according to the scheme, the respective characteristic advantages of global search and local search are creatively combined, a part of image is obtained based on the global search, the search range of the local search is limited in the mode, the local search is used for searching the local features of the image and matching key points, and the similar image with high similarity is finally obtained in the mode. In the deep learning feature network of the global search, the hyperbolic tangent function replaces the step function, so that the similarity ranking can be continuously optimized and adjusted in the training process, the wrongly ranked samples are adjusted, and the accuracy of the search result of the global search is improved. The local search of the scheme is based on the matching of the local features and the preliminary screening image to obtain the matching points with high correlation, and the similar image is obtained based on the matching points.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flow chart of a graph searching method according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating logical operations of a graph searching method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of image processing according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a local search network according to an embodiment of the present application;

FIG. 5 is a schematic diagram of weight update logic for a self-attention and cross-attention module according to an embodiment of the present application;

FIG. 6 is a block diagram of an apparatus for searching images according to an embodiment of the present application;

fig. 7 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.

It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.

Example one

The method aims to provide a technical scheme that a first similar image set is pre-screened through global search, matching feature key points with high correlation between an image to be searched and a first similar image are obtained in a local search mode based on the first similar image and a local image of the image to be searched, and a second similar image with higher similarity is finally determined from the first similar image set based on the number of the matching feature key points. When the method is realized, a first similar image set consisting of a plurality of first similar images is obtained by primarily screening the images to be searched by utilizing an improved global search network; and then obtaining a local image of the image to be searched, carrying out local search on the local image and the first similar image by utilizing an improved local feature extraction network, determining matching feature key points representing matching correlation on each similar image, and finally determining a second similar image with higher similarity based on the quantity sequence of the matching feature key points.

The embodiment of the present application provides a method for searching images by combining global search and local search to obtain a similar image with high similarity to an image to be searched, and specifically, referring to fig. 1, the method includes:

acquiring an image to be searched;

acquiring at least one first similar image with high similarity to the image to be searched in the historical image set to form a first similar image set;

determining at least one second similar image according to the number of the matched feature key points of each first similar image.

In the scheme, firstly, after an image to be searched and a historical image in a historical image set are globally searched, at least one historical image with high similarity to the image to be searched is pre-screened from the historical image set and is used as a first similar image; and then, local searching is carried out by utilizing the local image of the image to be searched and the first similar images, matching feature key points of each first similar image are obtained, and finally, the output second similar image is confirmed according to the number of the matching feature key points.

The step of obtaining at least one first similar image with high similarity with the image to be searched in the historical image set to form a first similar image set comprises the following steps: acquiring a first characteristic vector of the image to be searched; acquiring a second feature vector of each historical image of the historical image set; and comparing the similarity of the first feature vector and the second feature vector of each historical image, and acquiring the historical image corresponding to at least one second feature vector with high similarity as the first similar image.

It is worth mentioning that the historical image and the image to be searched in the scheme are compared with each other by global features, and in the step, the first similar image set is pre-screened by a graph searching method based on the global features. In the scheme, the conventional deep learning feature extraction network can be used for extracting the image features, and the improved deep learning feature extraction network can also be used for extracting the image features.

The improved deep learning feature extraction network of the scheme has the following improvement key points: and replacing a step function of the traditional deep learning feature extraction network in the training process with a hyperbolic tangent function. The method has the advantages that the deep learning feature extraction network can continuously perform gradient optimization by using the hyperbolic tangent function in the training process, the training samples with wrong similarity ranking are adjusted, the problem of gradient disappearance can be effectively alleviated in the training process, and the trained model is more accurate. The traditional step function has the defects of inexactness, no gradient optimization capability and difficulty in learning and adjusting samples with wrong sequencing; and the step function can optimize the sequencing result of the algorithm.

In order to distinguish the improved deep learning feature extraction network from the traditional deep learning feature extraction network, the improved deep learning feature extraction network is defined as a global feature extraction network.

Correspondingly, acquiring a first feature vector of the image to be searched; obtaining a second feature vector for each historical image of the set of historical images "comprises: inputting the image to be searched into a global feature extraction network to obtain the first feature vector; and inputting each historical image of the historical image set into a global feature extraction network to obtain the second feature vector, wherein a loss function of the global feature extraction network during training is replaced by a hyperbolic tangent function.

The hyperbolic tangent function as a loss function can be used for adjusting parameters of the global feature extraction network, wherein the hyperbolic tangent function is used as a deviation between output of the current global feature extraction network and label data corresponding to the point cloud data to be trained, the current output is output of the deep learning feature extraction network corresponding to the current point cloud data to be trained and current parameters of the deep learning feature extraction network, and the global feature extraction network can be optimized by continuously adjusting the parameters of the global feature extraction network.

The global feature extraction network is obtained by training a large number of training images, and the specific training steps are not different from the training of the conventional deep learning feature extraction network, so that excessive description is not given in the scheme.

In the embodiment of the present disclosure, the global feature extraction network uses an AP (average accuracy measure) as a metric of a global feature extraction algorithm, the AP is represented by an area of an accuracy-recall curve, and a formula after the AP of the present disclosure is optimized is as follows:

；

the loss function of the global feature extraction network is as follows:

SoftAP=1 - AP _query。

whereinAP _queryAverage accuracy of the images representing the query, C, C_PAnd C_NRespectively representing a set of similarity values for all samples, a set of positive sample similarity values and a set of negative sample similarity values,is represented by C_PThe number of positive samples in the set, T (x; sigma), is the hyperbolic tangent function. And the formula of the hyperbolic tangent function is as follows:

。

the loss function in the global feature extraction network can perform accurate weighted calculation according to the sequence of sample sequencing, samples which are sequenced farther forward have larger influence on the accuracy of the algorithm, the sequencing result in each training process can be sequenced according to the similarity of the calculated samples, the loss value of the sequenced samples is calculated according to the real labels of the samples, the loss is larger when the wrong samples are sequenced farther forward, the algorithm can adjust corresponding parameters until the model parameters are better, the loss is small enough, and the training is terminated.

In this embodiment, the global feature extraction network may be selected as the network structure of the ResNet 50.

As shown in fig. 2, fig. 2 is a schematic diagram of a logical framework of the graph searching method according to the present embodiment. In order to further improve the efficiency of searching the first similar image, the present solution may first construct a database storing the second feature vectors of the historical image set, which has the following advantages: when the image to be searched is retrieved each time, the second characteristic vector of the historical image set does not need to be extracted repeatedly, and only the first characteristic vector needs to be compared with the second characteristic vector stored in the database.

Correspondingly, the step of obtaining a second feature vector of each history image of the history image set comprises the following steps: inputting each historical image of the historical image set into a global feature extraction network to obtain the second feature vector, recording the second feature vector into a database, and recording the image information of each historical image in the database. Wherein the image information includes: and the information of the local directory path, the name, the index sequence number and the like of the historical image. At this time, the first feature vector of the image to be searched and the second feature vector in the database may be directly compared.

"comparing the similarity between the first feature vector and the second feature vector of each of the historical images" includes: and calculating the cosine similarity of the first feature vector and the second feature vector. If the cosine similarity is higher, the more similar the first feature vector and the second feature vector are. It is worth to be noted that "similarity is high" referred to in the present embodiment refers to a case where the image to be searched and the history image have more identical feature points. That is, acquiring at least one first similar image with high similarity to the image to be searched in the historical image set means: selecting an image with image features close to the image to be searched.

The step of "acquiring the history image corresponding to at least one of the second feature vectors with high similarity as the first similar image" includes: and sorting the historical images according to the similarity from high to low, and selecting the historical image which is sorted at the top as the first similar image. For example, the history images corresponding to the second feature vectors with high similarity of the top 10 bits may be selected as the first similar image, and form a first similar image set.

In the scheme, a local image of the image to be searched is intercepted, and secondary screening of similar images is carried out by taking the local image as a reference. The local image of the image to be searched may preferably be a local image covering a key part. Illustratively, as shown in fig. 3, a van in the left image is an image to be searched, and the region where the text mark of the van is located is selected as the local image.

In addition, particularly, according to the scheme, after the local image is intercepted, the local features are not directly extracted, the local image is subjected to data preprocessing by adopting a surrounding filling strategy, the local image is filled to the same scale as the original image, and the problem of data distortion caused by scaling of the scale required by algorithm input data is solved. Correspondingly, the step of acquiring the local image of the image to be searched comprises the following steps: and intercepting a local area image of the image to be searched, and filling the size of the local area image to the size of the image to be searched. That is, the size of the local image is the same as that of the image to be searched, and the local image includes the intercepted local area image.

In filling the partial image, it is preferable to fill in a background color that is distinguishable from the background of the partial image.

In this scheme, the local feature extraction network is configured to extract matching feature key points of each of the similar images and the local image.

The structure of the local feature extraction network is shown in fig. 4, and the local feature extraction network comprises a local feature extractor, a feature position encoder, a self-attention and cross-attention module, a feature matcher and a feature matching point screening module; wherein the local feature extractor is configured to extract a first local image feature map of the local image and a first similar image feature map of the first similar image; the feature position encoder is configured to obtain a first local image feature map vector of the first local image feature map and a first similar image feature map vector of the first similar image feature map; the self-attention and cross-attention module performs weighting processing on the input first similar image feature map vector and the first local image feature map vector in a self-attention and cross-attention double-attention mode to obtain a second similar image feature map vector and a second local image feature map vector; the characteristic matcher is used for acquiring a first matching point of the second similar image characteristic map vector and the second local image characteristic map vector; the feature matching point screening module is used for acquiring a local image area feature map of a first local image feature map and a similar image area feature map of a first similar image feature map based on the position of the first matching point; and inputting the local image region feature map and the similar image region feature map into the self-attention and cross-attention module again to obtain a third local image feature map vector and a third similar image feature map vector, calculating the correlation between the third local image feature map vector and the third similar image feature map vector, and obtaining the key points with high correlation as the matching feature key points.

Specifically, inputting the local image and the first similar image set into a local feature extraction network to extract the matching feature key point of each first similar image includes: acquiring a first matching point of the local image and the first similar image, extracting a local image area feature map of the local image according to the position of the first matching point, extracting a similar image area feature map of the first similar image according to the position of the first matching point, and acquiring matching feature key points of the local image area feature map and the similar image area feature map.

The first matching point is a key point matched on the local image and the first similar image, and the first matching point contains different local area information. Specifically, the "acquiring a first matching point of the local image and the first similar image" further includes: extracting a first local image feature map vector of the local image, extracting a first similar image feature map vector of the first similar image, inputting the first local image feature map vector and the first similar image feature map vector into a self-attention and cross-attention module for double attention weight weighting, and respectively outputting a second local image feature map vector and a second similar image feature map vector; and acquiring a score matrix matched with the second similar image feature map vector and the second local image feature map vector, and acquiring the first matching point based on the score matrix.

Further, "extracting a first local image feature map vector of the local image" includes: and extracting a first local image feature map of the local image by using a local feature extractor, and carrying out position coding on each feature point on the first local image feature map to obtain a first local image feature map vector.

Similarly, the "extracting the first similar image feature map vector of the first similar image" includes: and extracting a first similar image feature map of the first similar image by using a local feature extractor, and carrying out position coding on each feature point on the first similar image feature map to obtain a first similar image feature map vector.

In the scheme, the local feature extractor is constructed by combining restNet18 with FPN, and the local feature extractor enhances the extracted image features to improve the adaptability of scale transformation by adopting a mode of randomly transforming the input image during training.

The scheme adopts a characteristic position encoder to encode the position of the characteristic point. Specifically, in the present solution, a sine function is adopted to perform position coding on each feature point on the first similar image feature map and the first local image feature map respectively, so as to obtain a D-dimensional first local image feature map vector and a first similar image feature map vector. The encoding formula of the characteristic position encoder is as follows:

where f is the image feature point and i is one dimension of the vector.

The main role of the self-attention and cross-attention module in "inputting the first local image feature map vector and the first similar image feature map vector into a self-attention and cross-attention module for double attention weight weighting, and respectively outputting a second local image feature map vector and a second similar image feature map vector" is that: and (3) enabling the features to generate connection between the features in the pictures and between the pictures repeatedly, and updating the weight of the next node by adopting a message mechanism, wherein the weight of each node is equal to the weight of the previous layer and the weight of the adjacent node. That is, the self-attention and cross-attention module updates the first similar image feature map vector to obtain the second similar image feature map vector based on a weight updating method by using a relationship between the first local image feature map and the first similar image feature map, and updates the first local image feature map vector to obtain the second local image feature map vector.

Specifically, as shown in fig. 5, the logic principle of the weight update is as follows:

the method comprises the steps that input is a node to be updated and two adjacent nodes of the node to be updated, three nodes are linearly mapped into a query vector, a key vector and a value vector of a D dimension through MLP layer neural networks corresponding to the three nodes, the query vector and the key vector obtain weight distribution between the updated node and the adjacent nodes through a softmax function, the weight distribution is multiplied by a weight of the value vector to obtain a feature aggregation part of the adjacent nodes to a query node, the feature aggregation part and the query node are spliced into a Dx 2-dimensional vector, the D dimension is reduced through an MLP layer, and the D dimension is summed with an original weight of the updated node to obtain a final updated weight. And continuously circulating each feature node in this way to finally obtain the weighted second local feature map vector and the weighted second feature map vector.

The "matching the second similar image feature map vector with the second local image feature map vector by using the score matrix, and obtaining the first matching point based on the score matrix" includes: and calculating the matrix inner product of the second similar image characteristic map vector and the second local image characteristic map vector to obtain the score matrix, and screening the score matrix according to a confidence threshold to obtain the first matching point.

In the scheme, two dimensions of rows and columns of a softmax matrix are adopted to respectively calculate the second similar image characteristic map vector and the second local image characteristic map vector, and finally, multiplication is carried out to obtain a score matrix, wherein a calculation formula is as follows:

wherein conf_matrixIs a scoring matrix.

It is also worth mentioning that: the score matrix in the scheme covers two image feature matching values, the score matrix has expression differences in different dimensions and contains different local neighborhood information.

The method for extracting the local feature map of the similar image of the first similar image according to the position of the first matching point comprises the following steps: and acquiring a similar image local feature map of a first similar image feature map vector of the first similar image based on the first matching point.

Similarly, the "extracting a local image area feature map of the local image according to the position of the first matching point" includes: acquiring the local image region feature map of a first local image feature map of the local image based on the first matching point. The definition and acquisition methods of the first similar image feature map and the first local image feature map are as described above, and will not be described herein.

The step of obtaining matching feature key points of the local image region feature map and the similar image region feature map comprises the following steps: inputting the local image region feature map and the similar image region feature map into a self-attention and cross-attention module for double attention weight weighting, respectively outputting a third local image feature map vector and a third similar image feature map vector, calculating the correlation between the third local image feature map vector and the third similar feature map vector, and screening points with high correlation as the matched feature key points.

It should be noted that, position coding is performed on each feature point on the local image region feature map to obtain a local image region feature map vector, position coding is performed on each feature point on the similar image region feature map to obtain a similar image region feature map vector, and the local image region feature map vector and the similar image region feature map vector are input into the self-attention and cross-attention module to be processed to obtain a third local image feature map vector and the third similar image feature map vector. The specific structure of the self-attention and cross-attention modules is as described above and thus will not be described redundantly. The weights of the third local image feature map vector and the third similar image feature map vector are updated.

"calculating the correlation of the third local image feature map vector and the third similar image feature map vector" includes: and calculating response matrix diagrams of the third local image feature diagram vector and the third similar image feature diagram vector, acquiring response values of feature points based on the response matrix diagrams, and selecting the feature points with high response values as matching feature key points. In the scheme, Fourier transform can be introduced to accelerate the calculation process by utilizing the convolution theorem, and the response value and the correlation form positive correlation, so that the most relevant matching feature key point can be obtained.

There may be a plurality of matching feature key points on each first similar image, and the larger the number of matching feature key points, the more similar the first similar image and the image to be searched are represented according to the order from the most to the most. That is, the present scheme may select at least one of the first similar images ranked in the top order as the second similar image according to the first similar images ranked from high to low in the number of matching feature keypoints of each of the first similar images. It should be noted that how much of the second similar image is captured can be selected according to the number of the preset feature key points.

The image searching method can be applied to a server of an image searching engine, after the image searching engine acquires an image to be searched sent by a client, the server firstly adopts a global feature searching algorithm to search the image, so as to obtain the previous M similar image searching results and narrow the searching range of a local searching algorithm; and then, carrying out local feature search on the M images acquired by the global features, finally searching the previous K most similar images from the M global search results through the key point matching analysis of the local feature search, and pushing the images to the page to be inquired as the accurate result of the image search analysis.

Example two

Based on the same concept, referring to fig. 6, the present application further provides an apparatus for searching a graph with a graph, including:

an image acquisition unit 301 configured to acquire an image to be searched;

the global feature searching unit 302 is configured to obtain at least one first similar image with a high similarity to the image to be searched in the historical image set, and form a first similar image set;

a local feature searching unit 303, configured to obtain a local image of the image to be searched, input the local image and the first similar image set into a local feature extraction network, and extract a matching feature key point of each first similar image;

a result obtaining unit 304, configured to determine a second similar image according to the number of the matched feature key points of each of the first similar images.

The mapping device adopts the operation method as the mapping method in the first embodiment, and the repeated content is not redundantly described here.

EXAMPLE III

The embodiment also provides an electronic device, referring to fig. 7, including a memory 404 and a processor 402, where the memory 404 stores a computer program, and the processor 402 is configured to execute the computer program to perform the steps in any of the above-mentioned embodiments of the method for searching images.

Specifically, the processor 402 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.

Memory 404 may include, among other things, mass storage 404 for data or instructions. By way of example, and not limitation, memory 404 may include a hard disk drive (hard disk drive, HDD for short), a floppy disk drive, a solid state drive (SSD for short), flash memory, an optical disk, a magneto-optical disk, tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Memory 404 may include removable or non-removable (or fixed) media, where appropriate. The memory 404 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 404 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 404 includes Read-only memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or FLASH memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a static random-access memory (SRAM) or a dynamic random-access memory (DRAM), where the DRAM may be a fast page mode dynamic random-access memory 404 (FPMDRAM), an extended data output dynamic random-access memory (EDODRAM), a synchronous dynamic random-access memory (SDRAM), or the like.

Memory 404 may be used to store or cache various data files for processing and/or communication use, as well as possibly computer program instructions for execution by processor 402.

The processor 402 reads and executes the computer program instructions stored in the memory 404 to implement any of the graph searching methods in the above embodiments.

Optionally, the electronic apparatus may further include a transmission device 406 and an input/output device 408, where the transmission device 406 is connected to the processor 402, and the input/output device 408 is connected to the processor 402.

The transmitting device 406 may be used to receive or transmit data via a network. Specific examples of the network described above may include wired or wireless networks provided by communication providers of the electronic devices. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmitting device 406 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

The input and output devices 408 are used to input or output information. In this embodiment, the input information may be various types of images to be searched, and the output information may be a first similar image, a second similar image, and the like.

Optionally, in this embodiment, the processor 402 may be configured to execute the following steps by a computer program:

s101, acquiring an image to be searched;

s102, acquiring at least one first similar image with high similarity to the image to be searched in the historical image set to form a first similar image set;

s103, obtaining a local image of the image to be searched, inputting the local image and the first similar image set into a local feature extraction network to extract a matching feature key point of each first similar image;

s104, determining at least one second similar image according to the number of the matched feature key points of each first similar image.

It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.

In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

Embodiments of the invention may be implemented by computer software executable by a data processor of the mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also referred to as program products) including software routines, applets and/or macros can be stored in any device-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may comprise one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. Further in this regard it should be noted that any block of the logic flow as in the figures may represent a program step, or an interconnected logic circuit, block and function, or a combination of a program step and a logic circuit, block and function. The software may be stored on physical media such as memory chips or memory blocks implemented within the processor, magnetic media such as hard or floppy disks, and optical media such as, for example, DVDs and data variants thereof, CDs. The physical medium is a non-transitory medium.

It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.

The above examples are merely illustrative of several embodiments of the present application, and the description is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

16页详细技术资料下载

Method, device, computer program product and computer program for searching images by images

相关技术

网友询问留言