Image-text matching result determining method and device, electronic equipment and readable storage medium

文档序号：1875551 发布日期：2021-11-23 浏览：12次中文

阅读说明：本技术 图文匹配结果确定方法、装置、电子设备及可读存储介质 (Image-text matching result determining method and device, electronic equipment and readable storage medium ) 是由叶蕊张庆宾义黄君实王福龙罗恒亮于 2021-07-21 设计创作，主要内容包括：本公开提供了一种图文匹配结果确定方法、装置、电子设备及可读存储介质。包括：获取目标业务方,及与所述目标业务方关联的目标图像；根据所述目标业务方的业务方名称和所述目标图像的语义关联关系,确定所述目标业务方和所述目标图像的第一图文匹配结果；基于预先训练的图文相关性模型,获取所述业务方名称和所述目标图像的相关性分值,并根据所述相关性分值,确定所述目标业务方和所述目标图像的第二图文匹配结果；在所述第一图文匹配结果和所述第二图文匹配结果中存在至少一个匹配不一致结果的情况下,基于预先构建的知识图谱,确定所述目标业务方和所述目标图像的目标图文匹配结果。本公开能够提升图文匹配的泛化性及图文匹配判断的有效性。(The disclosure provides a method and a device for determining an image-text matching result, electronic equipment and a readable storage medium. The method comprises the following steps: acquiring a target service party and a target image associated with the target service party; determining a first image-text matching result of the target service party and the target image according to the semantic association relation between the service party name of the target service party and the target image; based on a pre-trained image-text correlation model, obtaining the correlation score of the name of the business party and the target image, and determining a second image-text matching result of the target business party and the target image according to the correlation score; and under the condition that at least one matching inconsistent result exists in the first image-text matching result and the second image-text matching result, determining a target image-text matching result of the target service party and the target image based on a pre-constructed knowledge graph. The method and the device can improve the generalization of image-text matching and the effectiveness of image-text matching judgment.)

1. A method for determining a picture-text matching result is characterized by comprising the following steps:

acquiring a target service party and a target image associated with the target service party;

determining a first image-text matching result of the target service party and the target image according to the semantic association relation between the service party name of the target service party and the target image;

based on a pre-trained image-text correlation model, obtaining the correlation score of the name of the business party and the target image, and determining a second image-text matching result of the target business party and the target image according to the correlation score;

and under the condition that at least one matching inconsistent result exists in the first image-text matching result and the second image-text matching result, determining a target image-text matching result of the target service party and the target image based on a pre-constructed knowledge graph.

2. The method of claim 1, wherein determining the first image-text matching result of the target business party and the target image according to the business party name of the target business party and the semantic association relationship of the target image comprises:

carrying out entity identification processing on the name of the business party to obtain a first entity identification result corresponding to the name of the business party;

performing entity identification processing on a target service object contained in the target image to obtain a second entity identification result corresponding to the target image;

acquiring a semantic association relation between the first entity recognition result and the second entity recognition result according to an entity association list;

and determining the first image-text matching result according to the semantic association relation.

3. The method of claim 1, wherein obtaining the relevance score of the business name and the target image based on a pre-trained graph-text relevance model comprises:

inputting the name of the business party and the target image into the image-text correlation model;

and calling the image-text correlation model to obtain a first vector corresponding to the name of the business party and a second vector corresponding to the target image, and calculating to obtain the correlation score of the name of the business party and the target image according to the first vector and the second vector.

4. The method of claim 1, further comprising, after said determining a second teletext match result for the target party and the target image based on the relevance score:

and determining the image-text matching result of the target service party and the target image as a result of consistent image-text matching under the condition that the first image-text matching result and the second image-text matching result are both the results of consistent image-text matching.

5. The method of claim 1, further comprising, before the determining a target-image-text matching result between the target service party and the target image based on the pre-constructed knowledge-graph,:

constructing an initial knowledge graph according to historical co-occurrence data of a service party and a service object;

determining the co-occurrence probability of the business party and the business object according to the co-occurrence times of the business party and the business object;

and endowing the co-occurrence probability to the initial knowledge graph to generate the knowledge graph.

6. The method of claim 5, wherein determining a target image-text matching result between the target service party and the target image based on the pre-constructed knowledge-graph comprises:

acquiring the name of the service party and the target co-occurrence probability of a target service object in the target image according to the knowledge graph;

and determining the target image-text matching result according to the target co-occurrence probability.

7. The method of claim 6, wherein determining the target-teletext match result according to a target co-occurrence probability comprises:

acquiring initial co-occurrence probability between the target service party and other service objects which have connection relation with the target service party according to the knowledge graph;

sequencing the target business object and the other business objects according to the initial co-occurrence probability and the target co-occurrence probability;

and determining a target image-text matching result of the target service party and the target image according to the sequencing result.

8. An image-text matching result determination device, characterized by comprising:

the target image acquisition module is used for acquiring a target service party and a target image associated with the target service party;

the first matching result determining module is used for determining a first image-text matching result of the target business party and the target image according to the semantic association relation between the business party name of the target business party and the target image;

the second matching result determining module is used for acquiring the name of the business party and the relevance score of the target image based on a pre-trained image-text relevance model, and determining a second image-text matching result of the target business party and the target image according to the relevance score;

and the target matching result determining module is used for determining a target image-text matching result of the target service party and the target image based on a pre-constructed knowledge graph under the condition that at least one matching inconsistent result exists in the first image-text matching result and the second image-text matching result.

9. An electronic device, comprising:

a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the method for determining a teletext matching result according to any one of claims 1 to 7 when executing the program.

10. A readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method for determining a teletext matching result according to any one of claims 1 to 7.

Technical Field

The embodiment of the disclosure relates to the technical field of image-text matching, and in particular relates to a method and a device for determining an image-text matching result, an electronic device and a readable storage medium.

Background

With the development of science and technology and the improvement of economic level, more and more users can complete the purchase of various articles, such as catering, clothes, furniture and the like, through an order platform.

Currently, in most of webpage recommendations of order platforms, merchants are presented to users in a binary form of < merchant name, picture >, and the image-text presentation form can give users visual perception. However, there is a problem that part of the names of the merchants are inconsistent with the contents displayed in the pictures, for example, the name of a certain merchant is "xxx duck roasting shop", and the pictures displayed in the order platform show egg fried rice and the like, which may confuse the user, affect the purchasing judgment of the user, and reduce the user experience.

In order to ensure the consistency of the images and texts, the commonly used solution at present is to apply semantic information and combine entity incidence relation to solve the image and text matching problem, however, the determination of the entity incidence relation strongly depends on manual definition and historical data, the generalization is poor, and the effectiveness of image and text matching judgment is inevitably influenced.

Disclosure of Invention

The embodiment of the disclosure provides a method and a device for determining an image-text matching result, an electronic device and a readable storage medium, which are used for improving the generalization of image-text matching and improving the effectiveness of image-text matching judgment.

According to a first aspect of an embodiment of the present disclosure, there is provided a method for determining an image-text matching result, including:

acquiring a target service party and a target image associated with the target service party;

Optionally, the determining a first image-text matching result of the target business party and the target image according to the business party name of the target business party and the semantic association relationship of the target image includes:

carrying out entity identification processing on the name of the business party to obtain a first entity identification result corresponding to the name of the business party;

performing entity identification processing on a target service object contained in the target image to obtain a second entity identification result corresponding to the target image;

acquiring a semantic association relation between the first entity recognition result and the second entity recognition result according to an entity association list;

and determining the first image-text matching result according to the semantic association relation.

Optionally, the obtaining the correlation score between the business party name and the target image based on a pre-trained image-text correlation model includes:

inputting the name of the business party and the target image into the image-text correlation model;

Optionally, after determining a second image-text matching result between the target business party and the target image according to the relevance score, the method further includes:

Optionally, before the determining a target image-text matching result between the target service party and the target image based on the pre-constructed knowledge graph, the method further includes:

constructing an initial knowledge graph according to historical co-occurrence data of a service party and a service object;

determining the co-occurrence probability of the business party and the business object according to the co-occurrence times of the business party and the business object;

and endowing the co-occurrence probability to the initial knowledge graph to generate the knowledge graph.

Optionally, the determining a target image-text matching result between the target service party and the target image based on the pre-constructed knowledge graph includes:

acquiring the name of the service party and the target co-occurrence probability of a target service object in the target image according to the knowledge graph;

and determining the target image-text matching result according to the target co-occurrence probability.

Optionally, the determining the target image-text matching result according to the target co-occurrence probability includes:

acquiring initial co-occurrence probability between the target service party and other service objects which have connection relation with the target service party according to the knowledge graph;

sequencing the target business object and the other business objects according to the initial co-occurrence probability and the target co-occurrence probability;

and determining a target image-text matching result of the target service party and the target image according to the sequencing result.

According to a second aspect of embodiments of the present disclosure, there is provided an image-text matching result determination apparatus including:

the target image acquisition module is used for acquiring a target service party and a target image associated with the target service party;

Optionally, the first matching result determining module includes:

a first identification result obtaining unit, configured to perform entity identification processing on the business party name to obtain a first entity identification result corresponding to the business party name;

a second identification result obtaining unit, configured to perform entity identification processing on a target service object included in the target image to obtain a second entity identification result corresponding to the target image;

a semantic association relation obtaining unit, configured to obtain a semantic association relation between the first entity identification result and the second entity identification result according to an entity association list;

and the first image-text matching result determining unit is used for determining the first image-text matching result according to the semantic association relation.

Optionally, the second matching result determining module includes:

the target image input unit is used for inputting the name of the business party and the target image into the image-text correlation model;

and the relevance score calculating unit is used for calling the image-text relevance model to obtain a first vector corresponding to the name of the business party and a second vector corresponding to the target image, and calculating to obtain the relevance scores of the name of the business party and the target image according to the first vector and the second vector.

Optionally, the apparatus further comprises:

and the matching consistency result determining module is used for determining that the image-text matching results of the target service party and the target image are consistent results under the condition that the first image-text matching result and the second image-text matching result are consistent results of image-text matching.

Optionally, the apparatus further comprises:

the initial knowledge map building module is used for building an initial knowledge map according to historical co-occurrence data of a service party and a service object;

a co-occurrence probability determination module, configured to determine a co-occurrence probability of the service party and the service object according to the co-occurrence times of the service party and the service object;

and the knowledge graph generation module is used for endowing the initial knowledge graph with the co-occurrence probability to generate the knowledge graph.

Optionally, the target matching result determining module includes:

a target co-occurrence probability obtaining unit, configured to obtain, according to the knowledge graph, the name of the service provider and a target co-occurrence probability of a target service object in the target image;

and the target matching result determining unit is used for determining the target image-text matching result according to the target co-occurrence probability.

Optionally, the target matching result determining unit includes:

an initial co-occurrence probability obtaining subunit, configured to obtain, according to the knowledge graph, an initial co-occurrence probability between the target service party and another service object having a connection relationship with the target service party;

a business object ordering subunit, configured to order the target business object and the other business objects according to the initial co-occurrence probability and the target co-occurrence probability;

and the target matching result determining subunit is used for determining a target image-text matching result of the target service party and the target image according to the sequencing result.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic apparatus including:

the image matching device comprises a processor, a memory and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the image matching result determination method of any one of the above items is realized.

According to a fourth aspect of embodiments of the present disclosure, there is provided a readable storage medium, where instructions when executed by a processor of an electronic device enable the electronic device to execute any one of the above-mentioned image-text matching result determination methods.

The embodiment of the disclosure provides a method and a device for determining an image-text matching result, electronic equipment and a readable storage medium. The method comprises the steps of determining a first image-text matching result of a target service party and a target image according to a semantic association relation between the name of the service party of the target service party and the target image by acquiring the target service party and the target image associated with the target service party, acquiring a relevance score between the name of the service party and the target image based on a pre-trained image-text relevance model, determining a second image-text matching result of the target service party and the target image according to the relevance score, and determining a target image-text matching result of the target service party and the target image based on a pre-constructed knowledge map under the condition that at least one of the first image-text matching result and the second image-text matching result is inconsistent in matching. According to the embodiment of the text matching method and device, the text matching result is judged through three dimensions of entity association degree in the text, text multi-mode feature correlation degree and user behavior feature mined from the knowledge graph, and the result is fused, so that the generalization of text matching can be improved, and the effectiveness of text matching judgment can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments of the present disclosure will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a flowchart illustrating steps of a method for determining an image-text matching result according to an embodiment of the disclosure;

fig. 2 is a flowchart illustrating steps of another method for determining a matching result according to an embodiment of the disclosure;

fig. 3 is a schematic structural diagram of an apparatus for determining a result of matching images and texts according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of another image-text matching result determining apparatus according to an embodiment of the present disclosure.

Detailed Description

Technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present disclosure, belong to the protection scope of the embodiments of the present disclosure.

Example one

Referring to fig. 1, a flowchart illustrating steps of a method for determining a teletext matching result according to an embodiment of the present disclosure is shown, and as shown in fig. 1, the method for determining a teletext matching result may include the following steps:

step 101: and acquiring a target service party and a target image associated with the target service party.

The embodiment of the disclosure can be applied to a scene of matching the consistency of the business party displayed in the order platform and the image provided by the business party.

The target business party refers to a business party registered in an order platform for providing order items to a user, and in this example, the target business party is a merchant, such as a restaurant, a cold drink store, a flower store, and the like.

The target image refers to an image provided by the target business party for providing an item as a display target business party on the order platform, and in this example, the target image may be a dish image, a flower image, or the like.

In practical application, after the target service party successfully registers in the order platform, at least one image can be provided to the order platform as a target image for display, and in order to avoid the situation that pictures and texts (namely, the name of the target service party) are inconsistent, the scheme provided by the embodiment can be adopted to judge the matching result.

When the image-text matching result is judged, a target service party in the order platform and a target image associated with the target service party can be obtained.

After the target service party and the target image associated with the target service party are obtained, step 102 is performed.

Step 102: and determining a first image-text matching result of the target service party and the target image according to the semantic association relation between the service party name of the target service party and the target image.

The first image-text matching result is the image-text matching result of the name of the target service party and the target image obtained by adopting entity semantic analysis.

After the target service party and the target image associated with the target service party are obtained, a first image-text matching result of the target service party and the target image can be determined according to the semantic association relation between the name of the service party of the target service party and the target image. Specifically, the name of the service party can be named entity identification to obtain a service entity contained in the name of the service party, then a visual identification algorithm is applied to identify a service object in the target image to obtain a service object contained in the target image, named entity identification is performed on the service object to obtain a service entity contained in the target image, and finally, a first image-text matching result of the target service party and the target image is obtained by combining the two identified service entities and a pre-constructed entity association list.

After the first image-text matching result of the target service party and the target image is obtained, step 103 is executed.

Step 103: and acquiring the correlation score of the business party name and the target image based on a pre-trained image-text correlation model, and determining a second image-text matching result of the target business party and the target image according to the correlation score.

The second image-text matching result is the image-text matching result of the target service party and the target image obtained by adopting the image-text correlation model.

After the target service party and the target image are obtained, the name of the service party of the target service party and the relevance score of the target image may be obtained based on a pre-trained image-text relevance model, and a second image-text matching result of the target service party and the target image is determined according to the relevance score.

After the first and second teletext matching results are obtained, step 104 is performed.

Step 104: and under the condition that at least one matching inconsistent result exists in the first image-text matching result and the second image-text matching result, determining a target image-text matching result of the target service party and the target image based on a pre-constructed knowledge graph.

The target image-text matching result is an image-text matching result between a target service party and a target image determined according to a pre-constructed knowledge-only map.

After the first image-text matching result and the second image-text matching result are obtained, whether at least one image-text matching inconsistent result exists in the first image-text matching result and the second image-text matching result can be judged, and if the at least one image-text matching inconsistent result exists in the first image-text matching result and the second image-text matching result, the image-text matching results of the target service party and the target image are judged through a pre-constructed knowledge graph.

In this example, in a case where there is at least one matching inconsistency result in the first and second teletext matching results, a target teletext matching result of the target service party and the target image may be determined based on a pre-constructed knowledge map.

According to the embodiment of the text matching method and device, the text matching result is judged through three dimensions of entity association degree in the text, text multi-mode feature correlation degree and user behavior feature mined from the knowledge graph, and the result is fused, so that the generalization of text matching can be improved, and the effectiveness of text matching judgment can be improved.

The image-text matching result determining method provided by the embodiment of the disclosure determines a first image-text matching result of a target service party and a target image by acquiring the target service party and the target image associated with the target service party according to a semantic association relation between a service party name of the target service party and the target image, acquires a correlation score between the service party name and the target image based on a pre-trained image-text correlation model, determines a second image-text matching result between the target service party and the target image according to the correlation score, and determines a target image-text matching result of the target service party and the target image based on a pre-constructed knowledge graph under the condition that at least one matching inconsistency exists in the first image-text matching result and the second image-text matching result. According to the embodiment of the text matching method and device, the text matching result is judged through three dimensions of entity association degree in the text, text multi-mode feature correlation degree and user behavior feature mined from the knowledge graph, and the result is fused, so that the generalization of text matching can be improved, and the effectiveness of text matching judgment can be improved.

Example two

Referring to fig. 2, a flowchart illustrating steps of another method for determining a teletext matching result according to an embodiment of the present disclosure is shown, and as shown in fig. 2, the method for determining a teletext matching result may include the following steps:

step 201: and acquiring a target service party and a target image associated with the target service party.

The embodiment of the disclosure can be applied to a scene of matching the consistency of the business party displayed in the order platform and the image provided by the business party.

When the image-text matching result is judged, a target service party in the order platform and a target image associated with the target service party can be obtained.

After the target service party and the target image associated with the target service party are obtained, step 202 is performed.

Step 202: and carrying out entity identification processing on the name of the service party to obtain a first entity identification result corresponding to the name of the service party.

In this embodiment, after the name of the service party of the target service party is obtained, named entity identification may be performed on the name of the service party to obtain a first entity identification result corresponding to the name of the service party, that is, an identification result of the service object included in the name of the service party.

Step 203: and carrying out entity identification processing on the target business object contained in the target image to obtain a second entity identification result corresponding to the target image.

After the target image corresponding to the target service party is obtained, the target image may be identified by using a visual identification algorithm to identify the target service object included in the target image, and then, the named entity identification technology is used to perform named entity identification processing on the target service object to obtain a second entity identification result of the target service object in the target image, that is, an identification result corresponding to the target service object.

After the first entity recognition result and the second entity recognition result are obtained, step 204 is performed.

Step 204: and acquiring the semantic association relation between the first entity identification result and the second entity identification result according to the entity association list.

The entity association list refers to a pre-constructed list for indicating entity association relationship. The construction process for the entity association list may be: the image-text data are sampled, an entity association list is constructed according to the sampled data, wherein the entity association list mainly follows the following three principles, the entity association list can be matched with each other in public cognition (such as soybean milk, fried bread sticks and the like), the two entities are similar to each other in the public cognition (such as wontons, wontons and the like), and the store name entities are of categories and comprise picture and dish name entities (such as barbecuing, mutton shashliks and the like). A food entity is considered to be physically interrelated if one of the three principles above is satisfied.

After the first entity identification result and the second entity identification result are obtained, the semantic association relation between the first entity identification result and the second entity identification result can be obtained by querying the entity association list.

After obtaining the semantic association relationship between the first entity recognition result and the second entity recognition result, step 205 is executed.

Step 205: and determining the first image-text matching result according to the semantic association relation.

After the semantic association relationship between the first entity identification result and the second entity identification result is obtained, a first image-text matching result of the target business party and the target image can be determined according to the semantic association relationship, specifically, the first image-text matching result can be a result of image-text matching consistency under the condition that the semantic association relationship is collocation association, and otherwise, the first image-text matching result is a result of image-text matching inconsistency.

Step 206: and inputting the name of the business party and the target image into the image-text correlation model.

In this embodiment, the training process for the graph-text correlation model may be: 1. the method comprises the steps of obtaining model training samples, wherein in specific implementation, an image recognition technology is successfully applied to multiple fields, and good effects are achieved. In the image-text correlation model, an ImageNet neural network framework widely applied in an image recognition technology can be applied to extract visual features of an image. And inputting the picture to be judged into an ImageNet neural network, and outputting the picture as an N-dimensional feature vector, wherein the feature vector contains visual information of the image. Correspondingly, the text feature extraction technology is mature day by day, the context semantic features of the text can be extracted, and the word2vec model can be applied to extract the word embedding of the text. Inputting a text to be judged into a word2vec frame, and outputting the text to be judged as an N-dimensional feature vector, wherein the feature vector contains context semantic information of the text. 2. And in the model training process, a graph-text correlation model is trained according to the characteristics of the obtained image and the text, and the graph-text correlation model is a pre-training model and realizes the mutual dependence and the common learning of the text and the image information on the basis of a Transformer model. Different from the conventional Bert model, the graphic relevance model inputs not only text information but also visual information, so that the double-flow VilBert model is applied to realize multi-modal learning. Text information and picture information are not directly fused at the beginning of a model, and the characteristics of the text and the image are respectively learned through the encoding of a Transformer encoder. After the two modalities are coded, the output of the two modalities passes through a common attention mechanism module which is used for fusing information between different modalities (namely a text modality and an image modality).

The model can be pre-trained on two tasks, the first task is a mask task, and the second task is a graph-text matching task. After pre-training of the model, the input images and text may output a graph-text relevance score.

After the target business party and the target image are acquired, the business party name and the target image of the target business party can be input into the image-text correlation model.

After inputting the business party name and the target image into the teletext relevance model, step 207 is performed.

Step 207: and calling the image-text correlation model to obtain a first vector corresponding to the name of the business party and a second vector corresponding to the target image, and calculating to obtain the correlation score of the name of the business party and the target image according to the first vector and the second vector.

After the name of the business party and the target image are input into the image-text correlation model, the image correlation model can be called to obtain a first vector corresponding to the name of the business party and a second vector corresponding to the target image, and the correlation score of the name of the business party and the target image is obtained through calculation according to the first vector and the second vector.

After the business party name and the relevance score of the target image are obtained, step 208 is performed.

Step 208: and determining a second image-text matching result of the target service party and the target image according to the relevance score.

After the relevance score is obtained, a second image-text matching result of the target business party and the target image can be determined according to the relevance score, specifically, under the condition that the relevance score is greater than or equal to a preset threshold, the second image-text matching result is a result of image-text matching consistency, and under the condition that the relevance score is less than the preset threshold, the second image-text matching result is a result of image-text matching inconsistency.

After the first and second teletext matching results are obtained, step 209 is executed, or step 210 is executed.

Step 209: and under the condition that at least one matching inconsistent result exists in the first image-text matching result and the second image-text matching result, determining a target image-text matching result of the target service party and the target image based on a pre-constructed knowledge graph.

The target image-text matching result is an image-text matching result between a target service party and a target image determined according to a pre-constructed knowledge-only map.

First, the process of constructing the knowledge graph can be described in detail in conjunction with specific implementations.

In a specific implementation manner of the embodiment of the present disclosure, before the step 209, the method may further include:

step S1: and constructing an initial knowledge graph according to historical co-occurrence data of the service party and the service object.

In this embodiment, an initial knowledge graph may be constructed according to historical co-occurrence data of a service party and a service object, specifically, first, training data is obtained, where the training data includes the service party and the service object, the service party takes a restaurant as an example, and the training data is a store name and a dish name, then, the training data is preprocessed, specifically, the store name and the dish name are analyzed and abstracted, for example, the store name is analyzed as a congee store, the store name is analyzed as a hot pot store, and finally, in a knowledge graph construction stage, the initial knowledge graph may be constructed according to the historical co-occurrence data of the service party and the service object.

After the initial knowledge-graph is constructed, step S2 is performed.

Step S2: and determining the co-occurrence probability of the business party and the business object according to the co-occurrence times of the business party and the business object.

After the co-occurrence number is obtained, the co-occurrence probability of the service party and the service object can be determined according to the co-occurrence number of the service party and the service object.

Step S3: and endowing the co-occurrence probability to the initial knowledge graph to generate the knowledge graph.

After the co-occurrence probability is obtained, the co-occurrence probability can be given to an initial knowledge graph to generate a knowledge graph, specifically, store names and dish names are constructed according to historical exposure data, the co-occurrence times of the analyzed store names and dish names are calculated, the calculation times are standardized to obtain a probability value between [0 and 1], in order to avoid extreme conditions that the distribution of different categories is inconsistent and the occurrence times of special dishes are low, links of edges are established in the store names and the dish names with the co-occurrence probability being greater than or equal to 0, the size of the probability value reflects the correlation degree of the store names and the dishes, the higher the co-occurrence probability is, the higher the correlation is, and vice versa.

After the knowledge graph is constructed, a target image-text matching result between the target service party and the target image can be determined based on the knowledge graph, and specifically, the following specific implementation manner can be combined for detailed description.

In another specific implementation manner of the embodiment of the present disclosure, the step 209 may include:

sub-step M1: and acquiring the name of the service party and the target co-occurrence probability of the target service object in the target image according to the knowledge graph.

In this embodiment, after the name of the service provider and the target image of the target service provider are obtained, the target image may be subjected to recognition processing by a visual recognition algorithm to obtain a target service object included in the target image, and a target co-occurrence probability of the name of the service provider and the target service object is obtained according to a knowledge graph.

After the target co-occurrence probability is obtained, sub-step M2 is performed.

Sub-step M2: and determining the target image-text matching result according to the target co-occurrence probability.

After the target co-occurrence probability is obtained, a target image-text matching result between the target service party and the target image can be determined according to the target co-occurrence probability, and specifically, an initial co-occurrence probability between the target service party and other service objects having a connection relation with the target service party can be obtained according to a knowledge graph. And then sequencing the target service party and other service objects according to the initial co-occurrence probability and the target co-occurrence probability, and finally determining a target image-text matching result of the target service party and the target image according to a sequencing result. In the example, for the task whether the images and the texts are consistent, a relevancy threshold value is set, and the link between the shop name and the dish name with the relevancy probability value smaller than the threshold value is cancelled; and limiting the number of the neighbors, and regarding the neighbors of the relevance probability value Top-K of each analyzed store name as image-text consistency.

Step 210: and determining the image-text matching result of the target service party and the target image as a result of consistent image-text matching under the condition that the first image-text matching result and the second image-text matching result are both the results of consistent image-text matching.

After the first image-text matching result and the second image-text matching result are obtained, if the first image-text matching result and the second image-text matching result are both the results of image-text matching consistency, the image-text matching results of the target service party and the target image can be determined to be the results of image-text matching consistency, namely the image-text matching consistency.

EXAMPLE III

Referring to fig. 3, which shows a schematic structural diagram of an image-text matching result determining apparatus provided in an embodiment of the present disclosure, as shown in fig. 3, the image-text matching result determining apparatus 300 may include the following modules:

a target image obtaining module 310, configured to obtain a target service party and a target image associated with the target service party;

a first matching result determining module 320, configured to determine, according to the semantic association relationship between the business party name of the target business party and the target image, a first image-text matching result between the target business party and the target image;

a second matching result determining module 330, configured to obtain a relevance score of the business party name and the target image based on a pre-trained image-text relevance model, and determine a second image-text matching result of the target business party and the target image according to the relevance score;

a target matching result determining module 340, configured to determine, based on a pre-constructed knowledge graph, a target image-text matching result of the target service party and the target image when at least one matching inconsistency result exists in the first image-text matching result and the second image-text matching result.

The image-text matching result determining device provided by the embodiment of the disclosure determines a first image-text matching result of a target service party and a target image according to a semantic association relation between a service party name of the target service party and the target image by acquiring the target service party and the target image associated with the target service party, acquires a correlation score between the service party name and the target image based on a pre-trained image-text correlation model, determines a second image-text matching result between the target service party and the target image according to the correlation score, and determines a target image-text matching result of the target service party and the target image based on a pre-constructed knowledge graph under the condition that at least one matching inconsistency exists in the first image-text matching result and the second image-text matching result. According to the embodiment of the text matching method and device, the text matching result is judged through three dimensions of entity association degree in the text, text multi-mode feature correlation degree and user behavior feature mined from the knowledge graph, and the result is fused, so that the generalization of text matching can be improved, and the effectiveness of text matching judgment can be improved.

Example four

Referring to fig. 4, which shows a schematic structural diagram of another teletext matching result determination apparatus provided in an embodiment of the disclosure, as shown in fig. 4, the teletext matching result determination apparatus 400 may include the following modules:

a target image obtaining module 410, configured to obtain a target service party and a target image associated with the target service party;

a first matching result determining module 420, configured to determine a first image-text matching result between the target service party and the target image according to a semantic association relationship between a service party name of the target service party and the target image;

a second matching result determining module 430, configured to obtain a relevance score of the business party name and the target image based on a pre-trained image-text relevance model, and determine a second image-text matching result of the target business party and the target image according to the relevance score;

a target matching result determining module 440, configured to determine, based on a pre-constructed knowledge graph, a target image-text matching result of the target service party and the target image when at least one matching inconsistency result exists in the first image-text matching result and the second image-text matching result;

a matching result determining module 450, configured to determine that the image-text matching results of the target service party and the target image are a result of matching consistency when the first image-text matching result and the second image-text matching result are both a result of matching consistency.

Optionally, the first matching result determining module 420 includes:

a first identification result obtaining unit 421, configured to perform entity identification processing on the business party name to obtain a first entity identification result corresponding to the business party name;

a second identification result obtaining unit 422, configured to perform entity identification processing on a target service object included in the target image, so as to obtain a second entity identification result corresponding to the target image;

a semantic association relation obtaining unit 423, configured to obtain a semantic association relation between the first entity identification result and the second entity identification result according to the entity association list;

a first matching result determining unit 424, configured to determine the first image-text matching result according to the semantic association relationship.

Optionally, the second matching result determining module 430 includes:

a target image input unit 431 for inputting the business party name and the target image to the image-text correlation model;

and a relevance score calculating unit 432, configured to invoke the image-text relevance model to obtain a first vector corresponding to the business party name and a second vector corresponding to the target image, and calculate a relevance score between the business party name and the target image according to the first vector and the second vector.

Optionally, the apparatus further comprises:

the initial knowledge map building module is used for building an initial knowledge map according to historical co-occurrence data of a service party and a service object;

and the knowledge graph generation module is used for endowing the initial knowledge graph with the co-occurrence probability to generate the knowledge graph.

Optionally, the target matching result determining module 440 includes:

and the target matching result determining unit is used for determining the target image-text matching result according to the target co-occurrence probability.

Optionally, the target matching result determining unit includes:

and the target matching result determining subunit is used for determining a target image-text matching result of the target service party and the target image according to the sequencing result.

An embodiment of the present disclosure also provides an electronic device, including: a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the teletext matching result determination method of the preceding embodiment when executing the program.

Embodiments of the present disclosure also provide a readable storage medium, and when instructions in the storage medium are executed by a processor of an electronic device, the electronic device is enabled to execute the graph-text matching result determination method of the foregoing embodiments.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present disclosure are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the embodiments of the present disclosure as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the embodiments of the present disclosure.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the disclosure may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the disclosure, various features of the embodiments of the disclosure are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, claimed embodiments of the disclosure require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of an embodiment of this disclosure.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

The various component embodiments of the disclosure may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be understood by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in a motion picture generating device according to an embodiment of the present disclosure. Embodiments of the present disclosure may also be implemented as an apparatus or device program for performing a portion or all of the methods described herein. Such programs implementing embodiments of the present disclosure may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit embodiments of the disclosure, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. Embodiments of the disclosure may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The above description is only for the purpose of illustrating the preferred embodiments of the present disclosure and is not to be construed as limiting the embodiments of the present disclosure, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the embodiments of the present disclosure are intended to be included within the scope of the embodiments of the present disclosure.

The above description is only a specific implementation of the embodiments of the present disclosure, but the scope of the embodiments of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the embodiments of the present disclosure, and all the changes or substitutions should be covered by the scope of the embodiments of the present disclosure. Therefore, the protection scope of the embodiments of the present disclosure shall be subject to the protection scope of the claims.

19页详细技术资料下载

Image-text matching result determining method and device, electronic equipment and readable storage medium

相关技术

网友询问留言