Multi-modal searching method based on neighbor graph

文档序号:1846152 发布日期:2021-11-16 浏览:4次 中文

阅读说明:本技术 一种基于近邻图的多模态搜索方法 (Multi-modal searching method based on neighbor graph ) 是由 徐小良 吕凌威 王梦召 于 2021-10-19 设计创作,主要内容包括:本发明涉及一种基于近邻图的多模态搜索方法,先将参照数据集中每一个参照对象的各个模态数据生成特征向量,然后根据各特征向量先独立计算,再用聚集函数融合计算得到各查询对象间的融合距离,由此构建参照对象的近邻图。接着根据查询内容生成包含多个特征向量的查询向量,使用查询向量在近邻图上执行多模态搜索得到最相似的查询目标。本发明的方法通过查询融合距离而同时对对象的多个模态进行查询,并能通过调整聚集函数而改变不同模态对融合距离的影响权重,从而实现了在搜索过程中对模态重要性的灵活操控,并提高了搜索的效率和精度。(The invention relates to a multi-modal searching method based on a neighbor graph, which comprises the steps of firstly generating characteristic vectors according to modal data of each reference object in a reference data set, then independently calculating according to the characteristic vectors, and then obtaining a fusion distance between query objects through fusion calculation of an aggregation function, thereby constructing the neighbor graph of the reference objects. And then generating a query vector containing a plurality of feature vectors according to the query content, and performing multi-modal search on the neighbor graph by using the query vector to obtain the most similar query target. The method simultaneously inquires a plurality of modes of the object by inquiring the fusion distance, and can change the influence weight of different modes on the fusion distance by adjusting the aggregation function, thereby realizing flexible control on the importance of the modes in the searching process and improving the searching efficiency and precision.)

1. A multi-modal searching method based on a neighbor graph is characterized by comprising the following steps:

s1, acquiring a reference data set, wherein the reference data set comprises a plurality of reference objects, and each reference object comprises a plurality of modal data; the modal data of each reference object is the same in type;

s2, generating a corresponding feature vector for each modal data, wherein all feature vectors of each reference object form a reference object vector;

s3, calculating the characteristic vector distance of the data of the same modality among the reference objects;

s4, fusing distances of all the feature vectors in the reference objects by using an aggregation function to obtain a fusion distance between the vectors of the reference objects, wherein the aggregation function comprises weights of different feature vectors;

s5, generating a neighbor map of the reference object according to the fusion distance between the reference object vectors;

s6, generating a query object according to the query content, wherein the query object comprises a plurality of modal data of the same kind as the reference object, and generating a corresponding feature vector for each modal data of the query object;

and S7, using each feature vector of the query object to query the neighbor graph to obtain the closest reference object.

2. The multi-modal searching method based on neighbor map as claimed in claim 1, wherein said step S3 is specifically: and calculating the characteristic vector distance of the data of the same modality among the reference objects by using the similarity.

3. The multi-modal neighbor-based search method according to claim 1, wherein in step S4, the fused distance is calculated by the formula:

;

wherein, theAs a reference object vectore 1Ande 2to (1) akA distance between feature vectors, saidg(. cndot.) is an aggregation function.

4. The multi-modal search method based on neighbor graph as claimed in claim 3, wherein the aggregation function is adjusted according to the scene requirement variationg(. to) weight of feature vector distances for different modality data.

5. The multi-modal searching method based on neighbor graph as claimed in claim 1, wherein said step S5 specifically comprises:

s5.1, selecting an object vectore i Selecting the one with the minimum fusion distanceCSetting an angle threshold alpha and a connecting upper limit by using the individual object vector as a candidate setK

S5.2, selecting one from the candidate set ande i the object vector with the minimum fusion distance is connected with the object vector;

s5.3, sequentially selecting object vectors from the candidate set from near to fare j Attempting with the object vectore i Attached to the edge only atAnde i is connected withe i When the included angle of all the existing connecting edges is larger than alpha, the connecting edges can not be successfully connected until the object vectore i The number of the connecting edges reaches the numberKOr all object vectors in the candidate set are tried out;

s5.4, judging whether the object vectors are not selected in the step S5.1, if so, returning to the step S5.1, otherwise, finishing the neighbor graphG(V, E) In whichVA vertex represents one of said object vectors, being a set of vertices,Eis an edge set.

6. The multi-modal searching method based on neighbor graph as claimed in claim 1, wherein said step S7 specifically comprises:

s7.1, setting the vertex set of the neighbor graph asVLet the query vector beqSelecting a plurality of vertexes to form a query initial vertex setBV

S7.2, fromBIs selected fromqBlending the minimum distance of the unvisited verticesxVObtaining the abovexSet of neighbor verticesN(x) = (y 1, y 2, …, y l ) Whereiny l Is composed ofxTo (1) alThe number of the adjacent vertexes is equal to the number of the adjacent vertexes,lis composed ofxThe number of neighbors of (2); marking thexIs a visited vertex;

s7.3, calculatingN(x) Each neighbor vertex inqThe combined distance ofqCloser neighbor vertex replacementBMiddle distanceqThe most distant vertex;

s7.4, judgmentBIf there is a vertex which has not been visited, returning to step S7.2, otherwise, ending the query and returningBAs a result of a query, theBThe vertex in (1) is the closest reference object;

query vectors in the above query processAnd object vectore i Fusion distance ofd(q, e i ) Is composed of

Wherein the content of the first and second substances,as a query vectorqAnd object vectore i To (1) akThe distance between the individual feature vectors,f(. cndot.) is an aggregation function.

7. The multi-modal search method based on neighbor graph as claimed in claim 6, wherein the aggregation function is adjusted according to the scene requirement variationf(. to) weight of feature vector distances for different modality data.

Technical Field

The invention belongs to the technical field of search, and particularly relates to a multi-modal search method based on a neighbor graph.

Background

With the continuous development of the internet, various applications continuously generate and finally gather massive text, picture, audio and video data, the multimode and massive data provide great challenges for information retrieval, and with the research and progress of artificial intelligence technology, various modal data can be extracted by artificial intelligence and converted into feature vectors for performing various similarity calculations and various extended applications thereof, so the research of a multi-modal search method is very important.

Currently, two methods are mainly adopted for multi-modal search: the first is a search method of performing a single-mode search on each mode in multi-mode data and then merging search results, which is obvious that the efficiency of the method is remarkably reduced along with the increase of the number of modes and the data volume; secondly, the data of each mode are mapped to a uniform multi-mode vector space through a learning method and then searched, and the method has the defects that the importance of a certain mode cannot be freely controlled in a searching stage after a mapping model is trained, and the method is not flexible and low in recall rate.

Disclosure of Invention

Based on the above-mentioned shortcomings and drawbacks of the prior art, it is an object of the present invention to at least solve one or more of the above-mentioned problems of the prior art, in other words, to provide a neighbor graph-based multi-modal search method that satisfies one or more of the above-mentioned needs.

In order to achieve the purpose, the invention adopts the following technical scheme:

a multi-modal searching method based on a neighbor map comprises the following steps:

s1, acquiring a reference data set, wherein the reference data set comprises a plurality of reference objects, and each reference object comprises a plurality of modal data; the modal data of each reference object is the same in kind;

s2, generating corresponding feature vectors for each modal data, wherein all the feature vectors of each reference object form a reference object vector;

s3, calculating the characteristic vector distance of the data of the same modality among the reference objects;

s4, fusing the distances of all the feature vectors in each reference object by using an aggregation function to obtain the fusion distance between the vectors of each reference object, wherein the aggregation function comprises the weights of different feature vectors;

s5, generating a neighbor map of the reference object according to the fusion distance between the reference object vectors;

s6, generating a query object according to the query content, wherein the query object comprises a plurality of modal data of the same kind as the reference object, and generating corresponding characteristic vectors for each modal data of the query object;

s7, the nearest reference object is obtained by searching the neighbor map using the feature vectors of the searched objects.

Preferably, step S3 specifically includes: and calculating the characteristic vector distance of the data of the same modality among the reference objects by using the similarity.

Preferably, in step S4, the calculation formula of the fusion distance is:

;

wherein the content of the first and second substances,as a reference object vectore 1Ande 2to (1) akThe distance between the individual feature vectors,g(. cndot.) is an aggregation function.

As a further preferred approach, the aggregation function is adjusted according to scene demand changesg(. to) weight of feature vector distances for different modality data.

Preferably, step S5 specifically includes:

s5.1, selecting an object vectore i Selecting the one with the minimum fusion distanceCSetting an angle threshold alpha and a connecting upper limit by using the individual object vector as a candidate setK

S5.2, selecting one from the candidate set ande i the object vector with the minimum fusion distance is connected with the object vector;

s5.3, sequentially selecting object vectors from the candidate set from near to fare j Trial and object vectore i Attached to the edge only atAnde i is connected withe i When the included angle of all the existing connecting edges is larger than alpha, the connecting edges can not be successfully connected until the object vectore i Is connected with the edge number ofKOr all object vectors in the candidate set are tried out;

s5.4, judging whether the object vectors are not selected in the step S5.1, if so, returning to the step S5.1, otherwise, finishing the neighbor graphG(V, E) In whichVA set of vertices, one vertex representing an object vector,Eis an edge set.

Preferably, step S7 specifically includes:

s7.1, setting the vertex set of the neighbor map asVLet the query vector beqSelecting a plurality of vertexes to form a query initial vertex setBV

S7.2, fromBIs selected fromqBlending the minimum distance of the unvisited verticesxVObtainingxSet of neighbor verticesN(x) = (y 1, y 2, …, y l ) Whereiny l Is composed ofxTo (1) alThe number of the adjacent vertexes is equal to the number of the adjacent vertexes,lis composed ofxThe number of neighbors of (2); markingxIs a visited vertex;

s7.3, calculatingN(x) Each neighbor vertex inqThe combined distance ofqCloser neighbor vertex replacementBMiddle distanceqThe most distant vertex;

s7.4, judgmentBIf there is a vertex which has not been visited, returning to step S7.2, otherwise, ending the query and returningBAs a result of the query, it is possible to,Bthe vertex in (1) is the closest reference object;

query vectors in the above query processAnd object vectore i Fusion distance ofd(q, e i ) Is composed of

Wherein the content of the first and second substances,as a query vectorqAnd object vectore i To (1) akThe distance between the individual feature vectors,f(. cndot.) is an aggregation function.

As a further preferred approach, the aggregation function is adjusted according to scene demand changesf(. to) weight of feature vector distances for different modality data.

Compared with the prior art, the invention has the beneficial effects that:

the method can search multiple modes simultaneously, changes the weights of different modes by adjusting the aggregation function during searching, and has higher searching efficiency.

Drawings

FIG. 1 is a flow chart of a multi-modal search method based on a neighbor graph according to an embodiment of the present invention;

fig. 2 is an exemplary flowchart of a multi-modal search method based on a neighbor graph according to an embodiment of the present invention.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention, the following description will explain the embodiments of the present invention with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.

Example (b):

in the multi-modal searching method based on the neighbor graph provided by this embodiment, a flowchart is shown in fig. 1, in this embodiment, a menu is exemplified by performing multi-modal searching based on the neighbor graph, and a further exemplified flowchart is shown in fig. 2.

Firstly, step S1 is performed to obtain a reference data set, where the reference data set includes a plurality of reference objects, and each reference object includes multiple types of modal data; the modality data of each reference object is of the same kind.

Then, step S2 is performed to generate corresponding feature vectors for each modality data, and all feature vectors of each reference object constitute a reference object vector. The generation of the feature vector is performed by using a pre-model, for example, for text data such as raw material data and cooking process data, the existing BERT model can be adopted for training and extracting features to convert the text data into raw material feature vectors and cooking process feature vectors, and for finished product picture data, ResNet-50 is adopted for training and extracting features to convert the text data into finished product picture feature vectors.

In this embodiment, the reference data set is a recipe data set, and each object in the recipe data set, that is, the raw material data, the cooking process data, and the finished product picture data included in each recipe data set are respectively extracted and converted into a raw material feature vector, a cooking process feature vector, and a finished product picture feature vector, where a combination of the raw material feature vector, the cooking process feature vector, and the finished product picture feature vector of one recipe data is referred to as one recipe vector, and all the recipe vectors constitute a recipe vector set.

The recipe dataset is represented as:O = {o i i = 1, 2, …, N};

whereino i As a recipe data setOTo (1)iThe data of the individual recipes are recorded,Nthe number of the recipe data in the recipe data set is shown.

Recipe datao i Expressed as:

as recipe datao i The data of the raw materials of (a),as recipe datao i The cooking process data of (1) is stored,as recipe datao i The finished product picture data of (1).

The recipe vector set S is represented as:

S = {e i i = 1, 2, …, N};

whereine i Is the first in the recipe vector setiThe number of the individual recipe vectors is,Nthe number of the recipe vectors in the recipe vector set is shown.

Recipe vectore i Expressed as:

as a recipe vectore i The amount of the raw material vector of (a),as a recipe vectore i The vector of the cooking process of (a),as a recipe vectore i The finished product picture vector.

Then, in step S3, the feature vector distance of each type of modality data between the reference objects is calculated.

In this embodiment, the distance between the raw material vector, the cooking process vector and the finished product picture vector of each recipe data is calculated through independent similarity, and the similarity calculation can usel2 distance calculation, for vectorX(x 1, x 2, …, x n ) Sum vectorY(y 1, y 2, …, y n ) Is/are as followsl2 distance of (Σ:x i -y i )2)1/2

after the feature vector distances are calculated, step S4 is performed to fuse all the feature vector distances in the respective reference objects by using the aggregation function, thereby obtaining the fusion distance between the respective reference object vectors.

In this embodiment, the distance between the raw material vector, the cooking process vector and the finished product picture vector is fused through an aggregation function to obtain the fusion distance between the recipe vectors, and the calculation formula of the fusion distance is:

wherein, the water-soluble polymer is a polymer,as a recipe vectore 1Ande 2the distance of the vector of the raw material,as a recipe vectore 1Ande 2the distance of the cooking process vector of (a),as a recipe vectore 1Ande 2distance of the finished product picture vectorgAnd the (-) is an aggregation function and is used for fusing the vector distance of the raw materials, the vector distance of the cooking process and the vector distance of the finished product picture to obtain a fusion distance.

The aggregation function may beg(·) = w 1*d 1+ … +w k *d k + … +w m *d m Whereinw k Is as followskThe weight of each of the feature vectors is,d k is as followskDistance of individual feature vectors. The specific embodiment isg(·) = w 1*d 1+w 2 *d 2 +w 3 *d 3 w 1I.e. the weight of the vector of raw material,d 1is thatThe other parameters are the same.

As an improvement of the above scheme, the aggregation function can be adjusted according to the scene demand changegAnd (c) enabling the aggregation function to have different weights for the feature vector distances of different types of modal data, so that a certain mode is more favored.

After the fusion distance is calculated, step S5 is performed to generate a neighbor map of the reference object based on the fusion distance between the reference object vectors;

further, the generation process of the neighbor graph may be:

s5.1, selecting a recipe vector, namely an object vectore i Selecting the one with the minimum fusion distanceCSetting an angle threshold alpha and a connecting limit on each object vector as a candidate setK

S5.2, selecting one from the candidate set ande i the object vector with the minimum fusion distance is connected with the object vector;

s5.3, sequentially selecting object vectors from the candidate set from near to fare j Trial and object vectore i Attached to the edge only ate i Ande j is connected withe i The existing method can not successfully connect edges when the included angles between all the connecting edges are larger than alpha until all the object vectors in the candidate set are tried; this step makes the object vectore i Can be connected to the fusion distanced(e i , e j ) Relatively close and capable of ensuring edge-connecting diversityKA neighbor;

s5.4, judging whether the object vectors are not selected in the step S5.1, if so, returning to the step S5.1, otherwise, finishing the neighbor graphG(V, E) In whichVA set of vertices, one vertex representing an object vector,Ethe set of edges is a set of edges connected between object vectors.

After the data set, that is, the neighbor graph of the recipe data set in this embodiment is generated, the query can be performed using the neighbor graph, and step S6 is first performed: and generating a query object according to the query content, wherein the query object comprises a plurality of modal data of the same kind as the reference object, and generating a corresponding feature vector for each modal data of the query object.

Specifically, in the recipe query in this embodiment, a recipe query object is input, and the features of the raw material data, the cooking process data, and the finished product picture data included in the recipe query object are respectively extracted and converted into a raw material vector, a cooking process vector, and a finished product picture vector, where the raw material vector, the cooking process vector, and the finished product picture vector of one recipe query object are collectively referred to as one recipe query vector.

After the query vector and the feature vectors thereof are generated, step S7 may be performed to query the neighboring graph using the feature vectors of the query object to obtain the closest reference object.

Specifically, step S7 includes:

s7.1, setting the vertex set of the neighbor map asVLet the query vector beqSelecting a plurality of vertexes to form a query initial vertex setBV

S7.2, fromBIs selected fromqBlending the minimum distance of the unvisited verticesxVOf, markingxIs a visited vertex; obtainingxSet of neighbor verticesN(x) = (y 1, y 2, …, y l ) Whereiny l Is composed ofxTo (1) alThe number of the adjacent vertexes is equal to the number of the adjacent vertexes,lis composed ofxThe number of neighbors of (2);

s7.3, calculatingN(x) Each neighbor vertex inqThe combined distance ofqCloser neighbor vertex replacementBMiddle distanceqThe most distant vertex; in particular to sequentially selectN(x) Neighbor vertex in (1)yCalculatingyAndqa blending distance ofyCompareBMiddle distanceqThe farthest vertexzMore recently, it willyAdding intoBIn the process, thezRemoval ofB

S7.4, judgmentBIf there is a vertex which has not been visited, returning to step S7.2, otherwise, ending the query and returningBAs a result of the query; due to the limitation of the upper bound K, the number of the vertexes B is limited to K finally, namely the K vertexes closest to the query vector q, namely the K reference objects closest to the query.

Query vectors in the above query processAnd object vectore i Fusion distance ofd(q, e i ) Comprises the following steps:

wherein the content of the first and second substances,querying vectors for recipesqAnd recipe vectore i The distance of the vector of the raw material,querying vectors for recipesqAnd recipe vectore i The distance of the cooking process vector of (a),querying vectors for recipesqAnd recipe vectore i Distance of the finished picture vector.fThe aggregation function is used for fusing the raw material vector distance, the cooking process vector distance and the finished product picture vector distance to obtain a fusion distance, can be flexibly set according to the requirements of customers, and is used for adjusting the importance of the raw material data, the cooking process data or the finished product picture data so as to obtain a required result.

It should be additionally noted that the method of the above embodiment separately pairs the aggregation functions in two stagesNumber ofg(. a) andfadjusted to adjust the aggregation functiong(. to) emphasize a modality's weight by macroscopically considering the importance of the modality, and adjust the aggregation function at query timefThe tendency to a certain modality can be freely adjusted according to the needs of the user.

The method simultaneously queries a plurality of modes of the object by querying the fusion distance, and can change the influence weight of different modes on the fusion distance by adjusting the aggregation function, thereby realizing flexible control on the importance of the modes in the searching process and improving the searching efficiency and precision.

It should be noted that the above only illustrates the preferred embodiments and principles of the present invention, and that those skilled in the art will be able to make modifications to the embodiments based on the idea of the present invention, and such modifications should be considered as the protection scope of the present invention.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用户搜索方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!