Multi-data type hierarchical sequencing method and device

文档序号:1963857 发布日期:2021-12-14 浏览:10次 中文

阅读说明:本技术 一种多数据类型分层排序方法和装置 (Multi-data type hierarchical sequencing method and device ) 是由 张晨曦 于 2021-08-25 设计创作,主要内容包括:本申请涉及一种多数据类型分层排序方法和装置,其中方法包括:对接收到的搜索请求进行解析处理和意图识别,得到相应的结构化语义;根据结构化语义得到相应的搜索结果,并由搜索结果中提取出意图类搜索结果,对意图类搜索结果与结构化语义进行语义相似度计算,得到各意图类搜索结果分别与结构化语义的语义相似度评分;根据各意图类搜索结果的语义相似度评分和意图评分,得到各用户意图的最终评分;根据各用户意图的最终评分,对各用户意图进行分层排序,并将各意图类搜索结果对应相应的各用户意图。其能够根据用户意图和结果相关性进行双向判断,返回最全面且相关性强的结果,让用户可以根据意图层级分类快速找到所请求的目标服务或内容。(The application relates to a method and a device for hierarchical sequencing of multiple data types, wherein the method comprises the following steps: analyzing and processing the received search request and identifying the intention to obtain corresponding structured semantics; obtaining corresponding search results according to the structured semantics, extracting intention search results from the search results, and performing semantic similarity calculation on the intention search results and the structured semantics to obtain semantic similarity scores of the intention search results and the structured semantics respectively; obtaining the final scores of the intentions of the users according to the semantic similarity scores and the intention scores of the intention search results; and according to the final scores of the user intentions, hierarchically sequencing the user intentions, and enabling the intention search results to correspond to the corresponding user intentions. The method can perform bidirectional judgment according to the user intention and result correlation, return the most comprehensive and strong-correlation result, and enable the user to quickly find the requested target service or content according to intention level classification.)

1. A method for hierarchical ordering of multiple data types, comprising:

analyzing and processing the received search request and identifying the intention to obtain corresponding structured semantics; the structured semantics comprise identified user intentions and corresponding intention scores;

obtaining corresponding search results according to the structured semantics, extracting intention search results from the search results, and performing semantic similarity calculation on the intention search results and the structured semantics to obtain semantic similarity scores of the intention search results and the structured semantics respectively;

obtaining a final score of each user intention according to the semantic similarity score and the intention score of each intention type search result;

and according to the final scores of the user intentions, performing hierarchical sequencing on the user intentions, and enabling each intention type search result to correspond to each corresponding user intention.

2. The method according to claim 1, wherein a corresponding search result is obtained according to the structured semantics, an intention search result is extracted from the search result, and when semantic similarity calculation is performed between the intention search result and the structured semantics, the method further comprises:

and when the search result contains a recalled encyclopedic search result, performing semantic similarity calculation on the encyclopedic search result and the structured semantics.

3. The method of claim 1, wherein the structured semantics further comprises at least one of entity words, participles, synonyms, and error correction words related to the search request;

when analyzing the received search request, the method includes:

and extracting a search keyword from the search request, and performing keyword processing on the search keyword to obtain at least one of a real word, a participle, a synonym and an error correction word related to the search request.

4. The method of claim 1, wherein the intent recognition of the received search request is performed by a pre-trained intent recognition network model.

5. The method according to claim 4, wherein, in training the intention recognition network model, training sample data used is acquired by a crawler;

after the training sample data is obtained through the crawler, intention labeling is carried out according to the crawling path to which the training sample data belongs.

6. The method according to any one of claims 1 to 5, wherein a final score of each user intention is obtained by means of a weighted calculation according to the semantic similarity score and the intention score of each intention-type search result.

7. The method as claimed in claim 6, wherein obtaining the final score of each user intention according to the semantic similarity score and the intention score of each intention class search result by means of weighted calculation comprises:

extracting the semantic similarity scores under the same type of intentions from the intention type search results, and selecting the most relevant score from the semantic similarity scores under the same type of intentions;

and performing weighted calculation on each most relevant score and the corresponding intention score of each user intention to obtain each corresponding final score.

8. The method according to any one of claims 1 to 5, wherein the corresponding search results are obtained according to the structured semantics in a threshold screening manner;

when a corresponding search result is obtained according to the structured semantics in the threshold screening mode, setting corresponding thresholds for different data categories;

the data categories include at least one of an encyclopedia category, an intent category, a express category, and a bibliographic category.

9. The method according to any one of claims 1 to 5, wherein the step of associating each of the intention-type search results with a corresponding each of the user intents comprises:

sorting according to the data application attribute of each intention type search result;

wherein the data application attribute includes at least one of a service type and a content type.

10. A multiple data type hierarchical ranking apparatus, comprising: the system comprises a request analyzing and identifying module, a similarity evaluation module, a user intention score calculating module and a hierarchical sequencing module;

the request analysis and identification module is configured to analyze and process the received search request and identify the intention of the search request so as to obtain corresponding structured semantics; the structured semantics comprise identified user intentions and corresponding intention scores;

the similarity evaluation module is configured to obtain corresponding search results according to the structured semantics, extract intention search results from the search results, perform semantic similarity calculation on the intention search results and the structured semantics, and obtain semantic similarity scores of the intention search results and the structured semantics respectively;

the user intention score calculation module is configured to obtain a final score of each user intention according to the semantic similarity score and the intention score of each intention type search result;

the hierarchical ranking module is configured to hierarchically rank the user intentions according to the final scores of the user intentions, and correspond the intention-type search results to the corresponding user intentions.

Technical Field

The present application relates to the field of network data processing technologies, and in particular, to a method and an apparatus for hierarchical ordering of multiple data types.

Background

Search engine technology is widely used in various internet fields, and mainly includes content-based search engines such as open search and in-site search. The sorting of the search results of different vertical domains and different data types is mainly based on the data template to distinguish boundaries and sort according to the classification priority of the configuration information. Traditional search technology can't accurately satisfy the accurate search demand of user when many intentions, if: when the user searches for 'angry fire heavy case' and returns the result of ordering the movie tickets and the video, which category should be ranked first and which category should be ranked later. That is, the conventional search technology has difficulty in achieving ranking of search results in a multi-intent situation, and thus cannot make the output search results more suitable for the current multi-intent requirements of the user.

Disclosure of Invention

In view of this, the present application provides a multi-data type hierarchical ranking method, which can make the output search result more meet the current multi-intention requirement of the user.

According to an aspect of the present application, there is provided a multiple data type hierarchical sorting method, including:

analyzing and processing the received search request and identifying the intention to obtain corresponding structured semantics; the structured semantics comprise identified user intentions and corresponding intention scores;

obtaining corresponding search results according to the structured semantics, extracting intention search results from the search results, and performing semantic similarity calculation on the intention search results and the structured semantics to obtain semantic similarity scores of the intention search results and the structured semantics respectively;

obtaining a final score of each user intention according to the semantic similarity score and the intention score of each intention type search result;

and according to the final scores of the user intentions, performing hierarchical sequencing on the user intentions, and enabling each intention type search result to correspond to each corresponding user intention.

In a possible implementation manner, obtaining a corresponding search result according to the structured semantics, extracting an intention search result from the search result, and performing semantic similarity calculation between the intention search result and the structured semantics, further including:

and when the search result contains a recalled encyclopedic search result, performing semantic similarity calculation on the encyclopedic search result and the structured semantics.

In a possible implementation manner, the structured semantics further include at least one of entity words, participles, synonyms, and error-correcting words related to the search request;

when analyzing the received search request, the method includes:

and extracting a search keyword from the search request, and performing keyword processing on the search keyword to obtain at least one of a real word, a participle, a synonym and an error correction word related to the search request.

In one possible implementation, the intention recognition is performed on the received search request through a pre-trained intention recognition network model.

In one possible implementation manner, when the intention recognition network model is trained, used training sample data is acquired through a crawler;

after the training sample data is obtained through the crawler, intention labeling is carried out according to the crawling path to which the training sample data belongs.

In a possible implementation manner, when the final score of each user intention is obtained according to the semantic similarity score and the intention score of each intention type search result, a weighted calculation manner is used.

In one possible implementation manner, when obtaining a final score of each user intention according to the semantic similarity score and the intention score of each intention class search result in a weighted calculation manner, the method includes:

extracting the semantic similarity scores under the same type of intentions from the intention type search results, and selecting the most relevant score from the semantic similarity scores under the same type of intentions;

and performing weighted calculation on each most relevant score and the corresponding intention score of each user intention to obtain each corresponding final score.

In a possible implementation manner, when a corresponding search result is obtained according to the structured semantics, the search is performed according to a threshold value screening manner;

when a corresponding search result is obtained according to the structured semantics in the threshold screening mode, setting corresponding thresholds for different data categories;

the data categories include at least one of an encyclopedia category, an intent category, a express category, and a bibliographic category.

In one possible implementation manner, when each of the intention type search results corresponds to each of the corresponding user intentions, the method includes:

sorting according to the data application attribute of each intention type search result;

wherein the data application attribute includes at least one of a service type and a content type.

According to another aspect of the present application, there is also provided a multiple data type hierarchical sorting apparatus, including: the system comprises a request analyzing and identifying module, a similarity evaluation module, a user intention score calculating module and a hierarchical sequencing module;

the request analysis and identification module is configured to analyze and process the received search request and identify the intention of the search request so as to obtain corresponding structured semantics; the structured semantics comprise identified user intentions and corresponding intention scores;

the similarity evaluation module is configured to obtain corresponding search results according to the structured semantics, extract intention search results from the search results, perform semantic similarity calculation on the intention search results and the structured semantics, and obtain semantic similarity scores of the intention search results and the structured semantics respectively;

the user intention score calculation module is configured to obtain a final score of each user intention according to the semantic similarity score and the intention score of each intention type search result;

the hierarchical ranking module is configured to hierarchically rank the user intentions according to the final scores of the user intentions, and correspond the intention-type search results to the corresponding user intentions.

The method comprises the steps of performing intention identification on a received search request to obtain possible user intentions in the search request input by a user, then extracting intention search results from the search results, performing semantic similarity calculation on the intention search results and structured semantics obtained by analyzing and processing the search request to obtain semantic similarity scores of the intention search results and the structured semantics, respectively, and then performing corresponding hierarchical ordering on the user intentions according to the semantic similarity scores and the intention scores of the intention search results, and then performing corresponding hierarchical ordering on the user intentions according to the obtained final scores, so that bidirectional judgment can be performed according to the user intentions and result relevance, and the most comprehensive and strong-relevance result can be returned, allowing the user to quickly find the requested target service or content based on the intent level classification. And can also realize the return pushing of multi-purpose search results.

Other features and aspects of the present application will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the application and, together with the description, serve to explain the principles of the application.

FIG. 1 is a flow diagram illustrating a method for hierarchical ordering of multiple data types according to an embodiment of the present application;

fig. 2 is a block diagram showing a structure of a multiple data type hierarchical sorting apparatus according to an embodiment of the present application.

Detailed Description

Various exemplary embodiments, features and aspects of the present application will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present application. It will be understood by those skilled in the art that the present application may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present application.

First, it should be noted that the multi-data type hierarchical ranking method according to the embodiment of the present application may be applied to a data search engine, and rank a plurality of search results obtained from a search request input by a user according to a manner intended by a plurality of users, so that the user can more quickly and accurately find a target service or a target content when pushing the search results, and the method may be applied to both open search and in-station search.

Fig. 1 shows a flow diagram of a method for hierarchical ordering of multiple data class layers according to an embodiment of the application. As shown in fig. 1, the method includes: and step S00, analyzing and identifying the received search request to obtain corresponding structured semantics. Here, as will be understood by those skilled in the art, the received search request is initiated by the user at the front-end application and may be entered in a voice, text, or other format. The information such as voice and text input by the user is recognized, the request information of the user is analyzed by the scheduling server, the search words in the user request information are obtained and are recognized, and structured semantics are output. In the embodiment of the present application, the obtained structured semantics include the identified user intentions and intention scores of the identified user intentions.

Then, step S200 is performed to obtain corresponding search results according to the obtained structured semantics, extract intent search results from the search results, and perform semantic similarity calculation on the intent search results and the structured semantics to obtain semantic similarity scores between each intent search result and the structured semantics. Here, in obtaining the corresponding search result according to the structured semantics, the search result is usually a plurality of result data, and the data type of each result data is different. Some result data have data types of encyclopedia, some result data have data types of intention, some result data have data types of through, some result data have data types of pocket bottom, and the like. In the method of the embodiment of the application, semantic similarity calculation is performed on the intention search result.

And then, in step S300, a final score of each user intention is obtained according to the semantic similarity score and the intention score of each intention search result. It should be noted that the user intention refers to a use to which a search request input by the user corresponds. That is, the purpose of use after the requested data is acquired by the search request. Such as: according to business requirements, the system can be divided into shopping domains, automobile domains, video entertainment domains and the like. Under each vertical domain, there are multiple intents, such as: under the vertical domain of video entertainment, there can be various user intentions such as buying tickets for movies, watching videos, listening to music, etc. Here, it should be further noted that the user intention of the data may be customized according to actual business requirements and training data, and in the method of the embodiment of the present application, specific content of the user intention is not limited.

After the final draft scores of the identified user intentions are obtained through the above steps, in step S400, the user intentions are hierarchically ranked according to the final scores of the user intentions, and the intention search results are corresponding to the corresponding user intentions, so that the purpose of hierarchically ranking the searched intention search results according to different user intentions is achieved.

Therefore, the method of the embodiment of the application obtains the possible user intentions in the search request input by the user by performing intention identification on the received search request, then extracts the intention search result from the search result, calculates the semantic similarity between the intention search result and the structured semantics obtained by analyzing and processing the search request to obtain the semantic similarity score between each intention search result and the structured semantics, and then performs hierarchical ordering on each user intention according to the final score after obtaining the semantic similarity score and the intention score of each intention search result, so that the searched search results can perform corresponding hierarchical ordering based on different user intentions, thereby performing bidirectional judgment according to the user intentions and result relevance, returning the most comprehensive and strong-relevance result, allowing the user to quickly find the requested target service or content based on the intent level classification. And can also realize the return pushing of multi-purpose search results.

It should be noted that, in the method according to the embodiment of the present application, the structured semantics further includes at least one of entity words, participles, synonyms, and error-correcting words related to the search request. That is, the structured semantics comprise entity words, participles, synonyms, error-correcting words, the vertical domain and intention scores (such as movie ticket buying intention under the video entertainment vertical domain).

Correspondingly, when analyzing the received search request based on the above implementation of the structured semantics, the following implementation may be performed.

Firstly, extracting a search keyword from a search request, and then carrying out keyword processing on the search keyword to obtain at least one of entity words, participles, synonyms and error-correcting words related to the search request. And then, carrying out intention identification on the search keyword to obtain corresponding user intentions and scores of all the user intentions.

It should be noted that the step of performing keyword processing on the search keyword and the step of performing intent recognition on the search keyword may be performed synchronously or sequentially, and are not specifically limited in this embodiment of the application.

In one possible implementation, the intention recognition of the received search request may be performed by using a pre-trained intention recognition network model. It should be noted that the adopted intention recognition network model can directly adopt a neural network which is conventional in the field and is used for realizing target recognition, and a corresponding network model can be designed by self.

It should be noted that, in the method according to the embodiment of the present application, when the intention recognition network model is trained, the training sample data used may be acquired by a crawler. After the training sample data is obtained through the crawler, intention labeling is carried out according to the crawling path to which the training sample data belongs.

That is to say, in the method of the embodiment of the application, when training sample data of an intention recognition network model is acquired, material data under different applications are acquired through a crawler and stored in an offline database, and intention labeling is performed on crawl data according to crawl paths of the applications. When the marked data is stored in the offline database, indexes can be created according to different intention classifications under different vertical domains, so that when the intention recognition network model is trained, when training sample data is extracted from the offline database, the vertical domain data characteristics can be extracted from the material data according to the created indexes and used for training the intention recognition network model.

Therefore, according to the method, the intention of the user is recognized for the search request by using the pre-trained intention recognition network model, and the intention is uniformly marked in batches on the training sample data used for training the intention recognition network model by applying different vertical fields to the data acquired by various paths, so that the labor is saved, the intention marking is accurate, and the accuracy of the intention classification model is improved.

Further, after the structured semantics of the search request input by the user are obtained in any of the above manners, step S200 may be executed, a corresponding search result is obtained according to the structured semantics, an intention search result is extracted from the search result, and semantic similarity calculation is performed on the intention search result and the structured semantics.

Here, it should be noted that a plurality of search results obtained from the structured semantics may be provided. Recall acquisition of search results may be accomplished in a threshold filtering manner. In particular, and in accordance with the foregoing, the data types of the search results may include at least one of an encyclopedia class, an intent class, a express class, and a bibliographic class. Wherein, the search result of the encyclopedic class refers to data of a theoretical knowledge class based on a named entity. The search result of the intention class refers to data of a class that can achieve a certain function or achieve a certain purpose. The search result of the direct class refers to data directly acquiring a response matching the search request. The search results of the bibliographic categories refer to the bibliographic supplement which is related according to the intentions analyzed by the search words of the user and provides richer and diversified search results, and the bibliographic categories configure each intention in the intention tree in an operation mode. If the user intends to be a financial product, the bottom-of-pocket intention of the financial product can be configured as information and video.

Meanwhile, the data type division for the above-mentioned several search results can be realized in the following manner.

That is, the data type of the encyclopedic is compared with the named entity library through the user search word, the most similar entity knowledge is returned, for example, the user searches for the spider knight, the named entity of the spider knight is returned instead of the spider, and the encyclopedic data is recalled according to the named entity word; the intention type is that an intention tree is constructed according to business requirements, training data under corresponding intentions are obtained through a crawler or other data recording mode to carry out algorithm model training, when a user inputs a search word, the intention close to the search word of the user is identified through the algorithm model, and a result is recalled together with the search word according to the intention; the direct type data is accurately matched with the search results through the search words of the user, the data which is returned in the first place, for example, the direct word of the Tencent video official network can be configured as the Tencent, when the Tencent is input by the user, the direct word of the Tencent video official network is hit, and the result data is returned as the direct data in the first place; the bottom-tucking result is as described above and will not be described herein.

Further, when the search results with different intentions in different vertical domains are recalled in a threshold screening manner, the search results can be obtained by setting different thresholds for the divided data types.

The setting of the threshold values of different data types can be set according to the business process of various data recalls and the boundary of good and bad results. Such as: the intention search result can recall data through the dimensionality of intention, keywords, semantics and the like, and is not like direct search results which are accurately hit through the keywords, so that the threshold value can be properly lowered, the recall rate is improved, meanwhile, the setting of the threshold value is a step-by-step optimization process, and the step-by-step optimization calibration is carried out by observing the threshold value range in which good results and bad results in various search results are probably distributed.

In a possible implementation manner, for the encyclopedia class, the threshold value range can be set to 0-1. For the intention class, the threshold value range can be set to 0-1. For the direct class, the threshold value range can be set to 0-1. For the bottom pocket class, the threshold value range can be set to 0-1. Here, it should be noted that, in the method of the embodiment of the present application, the threshold value of each data type is a normalized value.

Further, the value of the search result can be obtained by calculating the score value (as the score 0-1 of the search result) of the BM25 obtained from the search engine through the keyword dimension. Meanwhile, the search terms return the score values of intentions (as intention scores of 0-1) according to the intention algorithm model, each search result has own intention label, and the final score of the search result is calculated according to the rules of the score of the search result and the intention score: if different threshold intervals (the parameter can refer to the process of setting the threshold value) are set for the score of the search result, for example, the coefficient below 0.3 is 0.3-0.6, 0.6-0.9 and 0.9-1, different thresholds correspond to different coefficients, for example, the coefficient below 0.3 is 0.2, the coefficient between 0.3-0.6 is 0.4, the coefficient between 0.6-0.9 is 0.6 and 0.9-1 is 0.8, and when the score of the search result is in which interval, the coefficient is multiplied by the coefficient of the corresponding interval to obtain a score a; similarly, different coefficients are set for different sections of the score, the score is multiplied by the coefficient of the corresponding section to obtain a score b, and finally, the score b is layered according to a + b. It should be noted that, when the intention score is 1, only this intention is meant, and no hierarchical ordering is involved.

After setting corresponding threshold values for different data types, when recalling search results of each intention under each vertical domain in a threshold screening mode according to the structured semantics, the values of the search results can be compared with the threshold values of the data types to which the search results belong, and when the values of the search results are greater than or equal to the threshold values of the data types to which the search results belong, the search results are recalled. And when the value of the search result is smaller than the threshold value of the data type to which the search result belongs, discarding the search result.

In addition, when business rearrangement is carried out by setting corresponding threshold values for different data types, if a search word is precisely matched with a certain search result, the intention vertical domain where the result is located is dispatched. Such as: when the user search word is 'news live broadcasting room', the live broadcasting intention score is high, the financial score is low, if a search result subject with a service type under the Xinwang financial channel exists in the financial index library, the search result subject is 'news live broadcasting room', the search result subject is recalled preferentially, and the live fish fighting is displayed later. If the search word accurately hits the encyclopedic entity word, the encyclopedic plate is preferentially displayed. Such as: when the user search word is 'Zhou Jilun' and the artist named entity word of Zhou Jilun is accurately hit in the encyclopedia entity library, the related encyclopedia results (similar to Baidu encyclopedia) are returned according to the Zhou Jilun entity word.

When the intention class plate exceeds the weight, the intention class is ordered to exceed the encyclopedia class. Such as: it is mentioned above that the intention result is not a precise matching process like a direct word or an encyclopedia entity word, but a deep analysis of human language, but the search result returned by the intention class may also exceed the encyclopedia, for example, when the user searches for a king of sea equiseti, the intention is identified as an intention to watch (i.e., watch a video or watch a cartoon), and the intention score is high, and when a complete animation episode of the king of sea equiseti exists in the video, the result score is also high, so that the final score of the classification of finally watching the video exceeds a certain threshold (e.g., 0.95), and then the animation episode of the king sea equiseti ranks before the encyclopedia result under the intention of watching the video.

At the same time, a corresponding maximum number may also be set for the number of results of the search results recalled for each data type. And the recalling of the search results of the data type is not carried out after the result quantity of the recalled search results of the data type reaches the set maximum quantity. That is, the result quantity of the search results of each data type is limited, and result aggregation is performed according to the same data type and the same data source, so that the situation that data are redundant due to excessive search results in a certain data type is prevented, and the phenomenon that the user experience effect is poor is avoided.

After the search results of each intention under each vertical domain are recalled according to the structured semantics in any mode, the intention search results can be extracted from the recalled search results, and then semantic similarity calculation is performed on each extracted intention search result and the structured semantics respectively, so that semantic similarity scores of each intention search result and the structured semantics are obtained. It should be noted that, in the method according to the embodiment of the present application, the semantic similarity between each intention search result and the structured semantics may be calculated by a conventional semantic similarity calculation method in the art, and details are not described here.

After the semantic similarity scores of the intention search results and the structured semantics are obtained, the final scores of the intentions of the users can be obtained according to the semantic similarity scores and the intention scores of the intention search results. In a possible implementation manner, the final score of each user intention can be obtained in a weighted calculation manner according to the semantic similarity score and the intention score of each intention type search result.

Specifically, semantic similarity scores under the same type of intention are extracted from the intention search results, and the most relevant score is selected from the semantic similarity scores under the same type of intention. And then, carrying out weighted calculation on each most relevant score and the corresponding intention score of each user intention to obtain each corresponding final score.

For example, the score of the search result (normalized to 0-1) and an intention score (normalized to 0-1) are output according to the search word of the user; different threshold intervals are set for the search results, such as less than 0.3, 0.3-0.6, 0.6-0.9, 0.9-1.

Different thresholds correspond to different weighting factors, such as: the coefficient is 0.2 below 0.3, the coefficient is 0.4 in the range of 0.3-0.6, the coefficient is 0.6 in the range of 0.6-0.9, and the coefficient is 0.8 in the range of 0.9-1.

When the search result score is in which interval, the coefficient is multiplied by the coefficient of the corresponding interval to obtain a score a; the intention score is also based on the same or similar principle, different intervals are set for the score, different weighting coefficients are set for different intention score intervals, and the intention score is multiplied by the weighting coefficient of the corresponding interval to obtain the score b.

And finally, obtaining the final scores of the intentions of the users according to the calculation mode of a + b. And then ranking the user intentions according to the final scoring numerical value of the user intentions. That is, the hierarchy of the user intention rankings is determined according to the final scores of the user intentions. Wherein, the user intention with the larger final score value is positioned at the front in the hierarchical ordering.

Here, it should be specifically noted that when the intention score is 1, only this intention is meant, and here, the user intention does not need to be hierarchically ordered.

Meanwhile, for the reason that a certain intention of the user is not identified in the intention classification, but a certain amount of high-scoring results under the same intention classification exist in the recalled results (namely, when the user intention which is not identified in the search request exists in the recalled search results), the server side automatically aggregates the high-scoring results with the same user intention to supplement the intention. In this embodiment, the value range of the result number of the search results intended by the same user may be set as: greater than or equal to 2.

It should be noted that, the larger the result number is set, the more accurate the intention is, but the difficulty of aggregating the intentions is also increased correspondingly, for example, the result number is set as: 2, the user searches for the word 'Xiang-Yuan', the intention recognition service recognizes that the user intention is the intention of listening to audio books, radio stations and the like and returns the results of the two intentions, but if a large number of Yueynpeng and Guidee vocable video results are found in the Aichi art and Youke videos and the user semantics are close to each other, the intention of watching videos is supplemented for the user, the vocable video results are returned, and if the vocable results are found in the Aichi art only and other video applications do not exist, the intention of watching videos is not supplemented for the user.

In addition, search results of the direct class are usually hit accurately, so that the recalled search results of the direct class do not need to be subjected to semantic similarity calculation. For the search results of encyclopedic, most cases of accurate hit are common, but the similarity score is calculated according to the meaning under the conditions of inaccurate hit and recall, and then corresponding sequencing is directly performed according to the calculated similarity score.

That is to say, in the method according to the embodiment of the present application, after the step S100 is performed to analyze the received search request and identify the intent, and obtain the corresponding structured semantics, the search results of the intentions in each vertical domain may be filtered and recalled according to the structured semantics and the threshold, and the search results are layered according to the intentions.

In other words, the recalled encyclopedic search results and intention search results are refined according to the title, the affiliated application, the type, the timeliness, the heat degree and other characteristic values, and semantic similarity scores are output. And then, carrying out weighted calculation by using the search result score which is most semantically relevant under each intention and the relevant score of the user search word and the intention, and taking the obtained score as the final score of the user intention. And finally, determining a hierarchical structure of each user intention ordering according to the final scores of the user intentions. If a certain intention of the user is not identified in the intention classification, but a certain amount of high-scoring results under the same intention classification exist in the recall result, the server side automatically aggregates the high-scoring results with the same intention to supplement the intention.

Furthermore, after the recalled search results are hierarchically ranked in any of the above manners, the search results in each layer also need to be ranked in a certain order, so that the user can more accurately and efficiently find the currently requested target service or target content.

That is, when each intention type search result corresponds to each corresponding user intention, the method includes: and sorting according to the data application attribute of each intention type search result. Wherein the data application attribute includes at least one of a service type and a content type. In one possible implementation, the ranking of search results within each hierarchy may be implemented as follows.

The service type results in each intent level take precedence over the content type results. Alternatively, a threshold value is set for each of the service type result and the content type result, and when the score of the content type result exceeds the set threshold value, the result is ranked before the service type result, and when the score of the service type result reaches the score threshold value, the result is not ranked before the service type result even if the content type result is ranked before the service type result.

Wherein, the service type result is, for example, the charging fee, and the result of checking weather; the content-type results are, for example, a piece of video in the Egqi art, a piece of information in the top of the day.

Further, the setting of the threshold values for the service type result and the content type result may be performed according to the threshold distribution section of the premium service type result and the threshold distribution section of the premium content type search result. Such as: in one possible implementation, the threshold for the service type result may be set to 0.8 and the threshold for the content type result may be set to 0.9. Meanwhile, for the calculation of the values of the service type result and the content type result, the calculation mode of the values of the search result can be adopted, the principle of the calculation mode is the same as or similar to the calculation mode of the search result, and the description is omitted here.

It should be noted that, in principle, when the service result and the content result are recalled simultaneously, the service result is ranked before the content result, but if the content score is greater than the content result threshold of 0.9, the content result is considered to be strongly related to the user search term semantic, the partial result is ranked before the service result, and if the score of the service result reaches the service result threshold of 0.8, the content result score of 0.9 is not ranked before the service result.

Therefore, the method of the embodiment of the application is mainly divided into an off-line module and an on-line module. The offline module is mainly used for training the intention recognition model and recording and processing search materials, and comprises the following steps: the data type is divided into encyclopedic class, intention class (main major class), direct class and pocket bottom class, and the data vertical domain is divided according to the business requirements, such as: can be divided into shopping, cars, video and audio entertainment and other vertical domains and division of multiple drawings under each vertical domain, such as: the video entertainment vertical domain has the intentions of buying tickets for movies, watching videos, listening to music and the like. The data type division, the data vertical domain division and the user intention division can be customized according to business requirements and training data. And the online part carries out natural language processing on the search words of the user, and carries out scattering, aggregation and hierarchical sequencing on the recalled search results according to the user intention, the categories of the results, the result sources and the result relevance.

It should be noted that although the above-mentioned multiple data type hierarchical sorting method is described by way of example in fig. 1, those skilled in the art will appreciate that the present application should not be limited thereto. In fact, the user can flexibly set the specific implementation manner of each step according to personal preference and/or actual application scenarios, as long as the data sequencing and pushing with multi-user intentions can be realized.

Correspondingly, based on any one of the multi-data type hierarchical sequencing methods, the application also provides a Chinese multi-data type hierarchical sequencing device. Since the working principle of the multi-data type hierarchical sequencing device provided by the application is the same as or similar to that of the multi-data type hierarchical sequencing method in the embodiment of the application, repeated parts are not repeated.

Referring to fig. 2, the multi-data type hierarchical ranking apparatus 100 provided by the present application includes a request parsing and identifying module 110, a similarity evaluation module 120, a user intention score calculating module 130, and a hierarchical ranking module 140. The request parsing and identifying module 110 is configured to parse and identify the received search request and identify an intention to obtain a corresponding structured semantic; here, it should be explained that the structured semantics include the recognized user intention and the corresponding intention score. The similarity evaluation module 120 is configured to obtain corresponding search results according to the structured semantics, extract intention search results from the search results, perform semantic similarity calculation on the intention search results and the structured semantics, and obtain semantic similarity scores between each intention search result and the structured semantics. And a user intention score calculating module 130 configured to obtain a final score of each user intention according to the semantic similarity score and the intention score of each intention class search result. And the hierarchical ranking module 140 is configured to hierarchically rank each user intention according to the final score of each user intention, and correspond each intention type search result to each corresponding user intention.

Having described embodiments of the present application, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种信息展示方法、装置以及计算机存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!