Search method, search device, server and computer-readable storage medium

文档序号:987773 发布日期:2020-11-06 浏览:2次 中文

阅读说明:本技术 搜索方法、装置及服务器和计算机可读存储介质 (Search method, search device, server and computer-readable storage medium ) 是由 万鑫瑞 于 2020-07-15 设计创作,主要内容包括:本申请公开了一种搜索方法、装置、系统及一种服务器和计算机可读存储介质,该方法包括:获取搜索语句,确定所述搜索语句中的所有分词,并连接每两个相邻分词得到词语对;基于历史点击数据确定所述搜索语句对应的相关搜索语句集和相关文档集;在所述相关搜索语句集中统计预设时间段内每个所述词语对的第一点击信息,在所述相关文档集统计预设时间段内每个所述词语对的第二点击信息;通过拟合所述第一点击信息和所述第二点击信息得到每个所述词语对的紧密度;基于所有所述词语对的紧密度进行搜索得到所述搜索语句对应的搜索结果。由此可见,本申请提供的搜索方法,提高了紧密度计算的准确度,从而提高了搜索准确度。(The application discloses a searching method, a searching device, a searching system, a server and a computer readable storage medium, wherein the method comprises the following steps: acquiring a search sentence, determining all participles in the search sentence, and connecting every two adjacent participles to obtain a word pair; determining a related search statement set and a related document set corresponding to the search statement based on historical click data; counting first click information of each word pair in a preset time period in the related search statement set, and counting second click information of each word pair in the preset time period in the related document set; obtaining closeness of each word pair by fitting the first click information and the second click information; and searching based on the closeness of all the word pairs to obtain a search result corresponding to the search statement. Therefore, the searching method provided by the application improves the accuracy of the compactness calculation, so that the searching accuracy is improved.)

1. A method of searching, comprising:

acquiring a search sentence, determining all participles in the search sentence, and connecting every two adjacent participles to obtain a word pair;

determining a related search statement set and a related document set corresponding to the search statement based on historical click data; the relevant document set comprises relevant documents of the search statement, the relevant documents are clicked documents in the recall result corresponding to the search statement, the relevant search statement set comprises relevant search statements of the search statement, and the search statement is the same as the relevant documents of the relevant search statement;

counting first click information of each word pair in a preset time period in the related search statement set, and counting second click information of each word pair in the preset time period in the related document set;

obtaining closeness of each word pair by fitting the first click information and the second click information;

and searching based on the closeness of all the word pairs to obtain a search result corresponding to the search statement.

2. The search method of claim 1, wherein counting first click information of each word pair within a preset time period in the relevant search sentence set comprises:

counting click information of each word pair in a full context mode, a single context mode and a no context mode within a preset time period in the relevant search statement set as candidate first click information, and obtaining the first click information by fitting all the candidate first click information corresponding to each word pair;

correspondingly, the second click information of each word pair in the preset time period of the relevant document set statistics comprises:

and counting click information of each word pair in a full context mode, a single context mode and a no context mode within a preset time period in the relevant document set to serve as candidate second click information, and obtaining the second click information by fitting all the candidate second click information corresponding to each word pair.

3. The searching method according to claim 2, wherein before the obtaining the first click information by fitting all the candidate first click information corresponding to each word pair, further comprises:

assigning corresponding weights to the full-context mode, the single-context mode, and the no-context mode;

correspondingly, the obtaining the first click information by fitting all the candidate first click information corresponding to each word pair includes:

performing weighted fitting on all the candidate first click information corresponding to each word pair based on the weight to obtain the first click information;

correspondingly, the obtaining of the second click information by fitting all the candidate second click information corresponding to each word pair includes:

and performing weighted fitting on all the candidate second click information corresponding to each word pair based on the weight to obtain the second click information.

4. The search method of claim 1, wherein said deriving closeness for each of said word pairs by fitting said first click information and said second click information comprises:

distributing a first weight to the first click information and distributing a second weight to the second click information;

and performing weighted fitting on the first click information and the second click information based on the first weight and the second weight to obtain the closeness of each word pair.

5. The search method of claim 1, wherein the second click information of each word pair within a preset time period of statistics of the relevant document set comprises:

according to the named entities, carrying out structural division on each document in the relevant document set to obtain structural data corresponding to each document;

and counting second click information of each word pair in a preset time period based on the structured data.

6. The search method of claim 1, wherein before the deriving the closeness of each of the word pairs by fitting the first click information and the second click information, further comprising:

determining a correction coefficient corresponding to the preset time period; wherein the correction factor is negatively correlated with the time span between the preset time period and the current time;

correspondingly, after the obtaining the closeness of each word pair by fitting the first click information and the second click information, the method further includes:

and correcting the compactness of each word pair by using the correction coefficient.

7. The search method according to any one of claims 1 to 6, wherein, after obtaining the closeness of each of the word pairs by fitting the first click information and the second click information, further comprising:

calculating the left entropy of the first participle and the right entropy of the second participle in each word pair; the first participle is the first participle in the corresponding word pair, and the second participle is the last participle in the corresponding word pair;

judging whether the first participle and the second participle belong to the same named entity to obtain a first judgment result, and judging whether the left entropy and the right entropy are both larger than a preset value to obtain a second judgment result;

and obtaining a closeness correction parameter of each word pair based on the first judgment result and the second judgment result, and correcting the closeness by using the closeness correction parameter.

8. A search apparatus, comprising:

the acquisition module is used for acquiring a search sentence, determining all participles in the search sentence, and connecting every two adjacent participles to obtain a word pair;

the first determining module is used for determining a related search statement set and a related document set corresponding to the search statement based on historical click data; the relevant document set comprises relevant documents of the search statement, the relevant documents are clicked documents in the recall result corresponding to the search statement, the relevant search statement set comprises relevant search statements of the search statement, and the search statement is the same as the relevant documents of the relevant search statement;

the statistical module is used for counting first click information of each word pair in a preset time period in the related search statement set and counting second click information of each word pair in the preset time period in the related document set;

the fitting module is used for obtaining the closeness of each word pair by fitting the first click information and the second click information;

and the searching module is used for searching based on the closeness of all the word pairs to obtain a searching result corresponding to the searching statement.

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the search method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the search method according to one of claims 1 to 7.

Technical Field

The present application relates to the field of search technologies, and more particularly, to a search method, apparatus, server, and computer-readable storage medium.

Background

In the music field, searching is performed by calculating the correlation between a search sentence (query) and a music document (Doc), recalling and sequencing data with high correlation, and showing the data to a user. Since a music document can be generally divided into a plurality of text fields by a proper name word such as a singer's name, a song title, an album title, etc., the correlation is calculated by calculating the coverage and the minimum distance in the text fields. At this point, if the words in the query hit scattered within the text field, or if the need for immediate hits between certain words is not considered, some results are returned with poor relevance. Affecting the on-line ordering result.

In the related art, because the searched query is various, and the music field has many problems such as specific nouns, network terms, and aliases, the inventor finds that at least the following problems exist in the related art in the process of implementing the invention: the computed closeness is less accurate, which in turn leads to inaccuracies in the search results.

Disclosure of Invention

An object of the present application is to provide a search method, apparatus, server, and computer-readable storage medium, which improve the accuracy of the closeness calculation, thereby improving the search accuracy.

To achieve the above object, a first aspect of the present application provides a search method, including:

acquiring a search sentence, determining all participles in the search sentence, and connecting every two adjacent participles to obtain a word pair;

determining a related search statement set and a related document set corresponding to the search statement based on historical click data; the relevant document set comprises relevant documents of the search statement, the relevant documents are clicked documents in the recall result corresponding to the search statement, the relevant search statement set comprises relevant search statements of the search statement, and the search statement is the same as the relevant documents of the relevant search statement;

counting first click information of each word pair in a preset time period in the related search statement set, and counting second click information of each word pair in the preset time period in the related document set;

obtaining closeness of each word pair by fitting the first click information and the second click information;

and searching based on the closeness of all the word pairs to obtain a search result corresponding to the search statement.

To achieve the above object, a second aspect of the present application provides a search apparatus comprising:

the acquisition module is used for acquiring a search sentence, determining all participles in the search sentence, and connecting every two adjacent participles to obtain a word pair;

the first determining module is used for determining a related search statement set and a related document set corresponding to the search statement based on historical click data; the relevant document set comprises relevant documents of the search statement, the relevant documents are clicked documents in the recall result corresponding to the search statement, the relevant search statement set comprises relevant search statements of the search statement, and the search statement is the same as the relevant documents of the relevant search statement;

the statistical module is used for counting first click information of each word pair in a preset time period in the related search statement set and counting second click information of each word pair in the preset time period in the related document set;

the fitting module is used for obtaining the closeness of each word pair by fitting the first click information and the second click information;

and the searching module is used for searching based on the closeness of all the word pairs to obtain a searching result corresponding to the searching statement.

To achieve the above object, a third aspect of the present application provides a server comprising:

a memory for storing a computer program;

a processor for implementing the steps of the search method as described above when executing the computer program.

To achieve the above object, a fourth aspect of the present application provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the steps of the above search method.

According to the scheme, the searching method provided by the application comprises the following steps: acquiring a search sentence, determining all participles in the search sentence, and connecting every two adjacent participles to obtain a word pair; determining a related search statement set and a related document set corresponding to the search statement based on historical click data; counting first click information of each word pair in a preset time period in the related search statement set, and counting second click information of each word pair in the preset time period in the related document set; obtaining closeness of each word pair by fitting the first click information and the second click information; and searching based on the closeness of all the word pairs to obtain a search result corresponding to the search statement.

According to the searching method, the related searching statement set and the related document set corresponding to the searching statement are determined based on the historical click data of the user, click information of each word pair is respectively counted in the related searching statement set and the related document set, and therefore the closeness of each word pair is obtained through fitting. Since the historical click data of the user has the characteristic of feedback correction, the calculated compactness is high in accuracy. Therefore, the searching method provided by the application calculates the closeness based on the historical click data of the user, solves the technical problem of low closeness calculation accuracy in the related technology, and improves the searching accuracy.

The application also discloses a searching device, a server and a computer readable storage medium, which can also realize the technical effects.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

fig. 1 is an architecture diagram of a search system according to an embodiment of the present application;

fig. 2 is a flowchart of a first search method provided in an embodiment of the present application;

fig. 3 is a flowchart of a second search method provided in an embodiment of the present application;

fig. 4 is a flowchart of a third search method provided in the embodiment of the present application;

fig. 5 is a flowchart of a fourth searching method provided in the embodiment of the present application;

fig. 6 is a flowchart of a fifth search method provided in the embodiment of the present application;

fig. 7 is a structural diagram of a search apparatus according to an embodiment of the present application;

fig. 8 is a block diagram of a server according to an embodiment of the present application.

Detailed Description

The applicant of the present application has found through research that the music domain has natural structured data, resulting in close relations between words and conventional language inconsistencies, e.g. between words and stop words may be close.

Specifically, as shown in table 1:

TABLE 1

Figure BDA0002585397490000041

In addition, the closeness of the same word pair in different queries should be different, and the related art cannot be adaptively corrected according to different queries. In the related art, the data clicked by the user in the search cannot be fully utilized, and the music document information clicked by the user is not utilized; the characteristic of the newness of search in the music field cannot be considered, and the compactness of newly appeared popular terms on the network cannot be calculated.

Therefore, in the application, the specific structural characteristics and the timeliness characteristics of the document in the music field and the clicking behavior of the user are comprehensively considered, so that the calculation mode of the compactness can be better fit with the compactness calculation of the music field, the calculation is more accurate, and the search is helped to obtain a satisfactory answer.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to facilitate understanding of the search statement response method provided in the present application, a system for use thereof will be described below. Referring to fig. 1, an architecture diagram of a search system provided by an embodiment of the present application is shown, and as shown in fig. 1, includes an interactive device 10 and a server 20. Wherein, the interactive device 10 and the server 20 are connected in communication through the network 30.

The interactive device 10 is used for interacting with a user, and may be an AI (Artificial Intelligence, english will be called intelligent Intelligence) device, such as an intelligent sound box, or a fixed terminal such as a PC (Personal Computer, or mobile terminal such as a mobile phone), which is not specifically limited herein. The interactive device 10 can receive a search sentence input by a user, and can support voice input of the user and text input of the user, that is, the search sentence can be in a voice form and can also be in a text form. Specifically, after receiving the search sentence of the user, the interactive device 10 may first analyze whether the search sentence is in a voice form or a text form, and if the search sentence is in the voice form, perform voice recognition on the search sentence to obtain a corresponding text form, so that the server 20 processes the search sentence in the text form.

The server 20 is a background server corresponding to the interactive device 10, and is configured to process a search statement sent by the interactive device 10. Firstly, determining a related search statement set and a related document set corresponding to a search statement based on historical click data of a user, secondly, respectively counting click information of each word pair in the related search statement set and the related document set, and finally, fitting to obtain the closeness of each word pair, wherein the closeness can guide subsequent search to improve the accuracy of the search.

The embodiment of the application discloses a searching method, which improves the accuracy of compactness calculation, thereby improving the searching accuracy.

Referring to fig. 2, a flowchart of a first search method provided in an embodiment of the present application is shown in fig. 2, and includes:

s101: the interactive equipment sends a search statement to the server;

in this step, the interactive device obtains a search statement input by the user in a voice or text form, performs voice recognition on the search statement in the voice form to obtain a search statement in the text form, and sends the search statement in the text form to the corresponding server.

S102: the server determines all the participles in the search sentence and connects every two adjacent participles to obtain a word pair;

in a specific implementation, a search sentence input by a user comprises a plurality of participles, and two adjacent participles can form a word pair. Firstly, the word segmentation processing is carried out on the search sentence input by the user to obtain all the word segments in the search sentence, (term)1,term2,…,termn) Segment () is a word segmentation function, term, wherein query is a search sentence input by a useriFor the ith participle in the search sentence, i is more than or equal to 1 and less than or equal to n. For example, the search sentence input by the user is "my baby", and the word segmentation result is: my, small, baby. Secondly, every two are connectedThe adjacent participles result in all word pairs contained in the search sentence, which in the above example is: my small and small baby.

S103: the server determines a related search statement set and a related document set corresponding to the search statement based on the historical click data;

the relevant document set comprises relevant documents of the search statement, the relevant documents are clicked documents in the recall result corresponding to the search statement, the relevant search statement set comprises relevant search statements of the search statement, and the search statement is the same as the relevant documents of the relevant search statement;

in this step, search sentences clicked to the same document are connected, and a corresponding relationship between the search sentences and related search sentences is established, thereby determining a related search sentence set corresponding to the search sentences input by the user. Meanwhile, for the recall result of the search statement input by the user, the document clicked by the user is selected based on the historical click data of the user, and a related document set corresponding to the search statement is formed.

S104: the server counts first click information of each word pair in a preset time period in the relevant search statement set, and counts second click information of each word pair in the preset time period in the relevant document set;

in specific implementation, the click information of each word pair in a preset time period is respectively counted in the related search statement set and the related document set. As a preferred embodiment, statistics may be performed by using the structural features of the documents in the music field, that is, the step of counting the second click information of each word pair within a preset time period of the relevant document set may include: carrying out structural division on each document in the related document set according to the named entities to obtain structural data corresponding to each document; and counting the second click information of each word pair in a preset time period based on the structured data. In a specific implementation, each document in the related document set may be structurally divided according to named entities, such as "singer name", "song name", "album name", "movie name", "translation name", and "alias" in the music domain, and structured data doclnfo ═ List [ song name, singer name, album name, movie name, translation name, alias, and the like ]. For a recalled document, if it is a lyric recall, the lyrics are divided by sentence, and doclnfo is List [ lyric 1, lyric 2, … ]. And counting the second click information of each word pair in a preset time period by the structured data, so that the counting efficiency of the click information can be improved.

S105: the server obtains the closeness of each word pair by fitting the first click information and the second click information;

in this step, the server obtains the closeness of each word pair by fitting the first click information and the second click information. It should be noted that, the embodiment does not limit the specific fitting manner, and as a possible implementation manner, a weighted fitting manner may be adopted, that is, the step may include: distributing a first weight to the first click information and distributing a second weight to the second click information; and performing weighted fitting on the first click information and the second click information based on the first weight and the second weight to obtain the closeness of each word pair.

S106: the server searches based on the closeness of all the word pairs to obtain a search result corresponding to the search statement;

in particular implementations, the closeness of each word pair in a search statement may guide subsequent search steps. In one aspect, whether two tokens in a pair of terms belong to the same search term may be determined based on closeness. For example, the search phrase "my and my grandma phenanthroline" includes a word pair: in the related art, the closeness of the calculated my sum, and my, and ancestor king is very low, so the search sentence "my and my ancestor king" includes the search term: the search results corresponding to the search sentences can be considered as long as the search words are contained in my, Hehe, Guo, and Wangfenfei search results, the position relation of each search word does not need to be completely consistent with the search sentences, and it can be seen that the obtained search words cannot be suitable for music documents because the word pair compactness calculated in the related technology does not accord with the compactness relation among the words in the music field. The closeness of each word pair calculated in this embodiment is: 0.968, 0.993, 0.979, 0.996, 0.0005, 0.973, it can be considered that word pairs with a closeness greater than 0.9 belong to the same search word, and therefore the search sentence "i and my grand king phenanthrene" includes two search words: the method is suitable for music documents, and improves the accuracy of determining the search terms. On the other hand, the necessary result of each search term can be determined by integrating the closeness of each word segmentation and other word segmentation information, so that the search result corresponding to the search sentence is obtained by searching. For the above example, the word segmentation information (which may include closeness, named entity, importance, etc.) of each word segmentation in the search term "i and my motherland" is determined as the must-stay search term, the word segmentation information of each word segmentation in the search term "wanghe" is determined as the non-must-stay search term, the search keywords include at least one must-stay search term, and therefore the search keywords corresponding to the search sentence "i and my grandghe" are "i and my grandghe", "i and my motherland", and the search result corresponding to the search sentence is obtained by searching using the search keywords. The closeness obtained by calculation in the embodiment accords with the closeness relation among words in the music field, so that the determined must-remain search word is suitable for the music field, and the search accuracy is improved.

S107: the server returns the search results to the interactive device.

According to the searching method provided by the embodiment of the application, the related searching statement set and the related document set corresponding to the searching statement are determined based on the historical click data of the user, the click information of each word pair is respectively counted in the related searching statement set and the related document set, and therefore the closeness of each word pair is obtained through fitting. Since the historical click data of the user has the characteristic of feedback correction, the calculated compactness is high in accuracy. Therefore, the searching method provided by the embodiment of the application calculates the closeness based on the historical click data of the user, solves the technical problem of low closeness calculation accuracy in the related technology, and improves the searching accuracy.

Compared with the previous embodiment, the present embodiment further describes and optimizes the technical solution, and introduces the server as an execution subject. Specifically, the method comprises the following steps:

referring to fig. 3, a flowchart of a second search method provided in the embodiment of the present application is shown in fig. 3, and includes:

s201: acquiring a search sentence, determining all participles in the search sentence, and connecting every two adjacent participles to obtain a word pair;

s202: determining a related search statement set and a related document set corresponding to the search statement based on the historical click data;

s203: counting click information of each word pair in a full context mode, a single context mode and a no context mode in a preset time period in a relevant search statement set as candidate first click information, and obtaining first click information by fitting all the candidate first click information corresponding to each word pair;

in this embodiment, click information of each word pair in the full context mode, the single context mode and the no context mode within a preset time period is counted in the relevant search statement set. Click information full _ context of word pair AB full context modecount(x, AB, y) ═ Σ if (AB in qanchor) × 1, x denotes a participle located before the word pair AB, y denotes a participle located after the word pair AB, qanchor is a related search sentence in the related search sentence set, AB inqanchor denotes that the related search sentence includes the word pair AB, and this formula means the sum of the number of all related search sentences including the word pair AB. Click information above _ context of word pair AB single-above modecount(x, AB) ═ Σ if (AB isthe end of qanchor) × 1, AB is the end of qanchor indicates that the word pair AB is at the end of the relevant search statement, this formula means that all ends are the sum of the number of relevant search statements of the word pair AB. Click information below _ context of word pair AB single context modecount(AB, y) ═ Σ if (AB is the term of qanchor) × 1, AB isthe term of qanchor) indicates that the term pair AB is at the beginning of the relevant search term, and the meaning of this formula is the sum of all relevant search terms headed for the term pair AB. WordClick information no _ context for language-to-AB no context modecount(AB) ═ Σ if (AB equivalent qanchor) × 1, AB equivalent qanchor means that the relevant search sentence is equal to the word pair AB, i.e., the relevant search sentence includes only the word pair AB, the expression means the sum of the number of relevant search sentences including only the word pair AB.

In a specific implementation, the first click information corresponding to each word pair can be obtained by fitting all candidate first click information corresponding to the word pair. The embodiment also does not limit the specific fitting manner, and as a possible implementation manner, a weighted fitting manner may be adopted, that is, before this step, the method further includes: the method comprises the following steps of distributing corresponding weights for a full-context mode, a single-context mode and a no-context mode, and obtaining first click information by fitting all candidate first click information corresponding to each word pair, wherein the first click information comprises the following steps: and performing weighted fitting on all the candidate first click information corresponding to each word pair based on the weight to obtain first click information.

S204: and counting click information of each word pair in a full context mode, a single context mode and a no context mode within a preset time period in a relevant document set to serve as candidate second click information, and obtaining the second click information by fitting all the candidate second click information corresponding to each word pair.

In the step, click information of each word pair in a full context mode, a single context mode and a no context mode within a preset time period is counted in a relevant document set. For a structured document, the statistical approach is as follows:

For info in DocInfo:

if{(x,AB,y)in info||(AB,y)in info||(x,AB)in info||(AB)in info)}:

corresponding pattern count +1

Break

The DocInfo is structured data corresponding to the relevant document set, and the Info is each record in the DocInfo, that is, the structured data corresponding to each relevant document. As long as one of the conditions in the if conditional statement described above is satisfied, the corresponding pattern count +1 is skipped from the loop. Wherein, (x, AB, y) in info indicates that (x, AB, y) is included in the structured data of the relevant document, the click information of the word on the full-context mode of AB is increased by 1, (AB, y) in info indicates that the beginning of the structured data of the relevant document is (AB, y), the click information of the word on the single-context mode of AB is increased by 1, (x, AB) in info indicates that the end of the structured data of the relevant document is (x, AB), the click information of the word on the single-context mode of AB is increased by 1, (AB) in info indicates that the structured data of the relevant document only includes (x, AB), and the click information of the word on the non-context mode of AB is increased by 1. Because the info is structured data, the data is a complete named entity, dictionary information is utilized, and excessive statistics is not needed.

In a specific implementation, the second click information corresponding to each word pair may be obtained by fitting all candidate second click information corresponding to the word pair. Here, the specific fitting manner is not limited, and for the weighted fitting manner, the step of obtaining the second click information by fitting all the candidate second click information corresponding to each word pair includes: and performing weighted fitting on all the candidate second click information corresponding to each word pair based on the weight to obtain second click information.

S205: obtaining the closeness of each word pair by fitting the first click information and the second click information;

s206: and searching based on the closeness of all the word pairs to obtain a search result corresponding to the search statement.

Therefore, according to the embodiment, the click information of each word in the four modes of the relevant search statement set and the relevant document set is respectively counted, the first click information corresponding to the relevant search statement set and the second click information corresponding to the relevant document set are obtained by fitting the click information in the four modes, and the accuracy of the click information counting is improved.

Compared with the first embodiment, the embodiment further describes and optimizes the technical scheme, and introduces the server as an execution subject. Specifically, the method comprises the following steps:

referring to fig. 4, a flowchart of a third search method provided in the embodiment of the present application is shown in fig. 4, and includes:

s301: acquiring a search sentence, determining all participles in the search sentence, and connecting every two adjacent participles to obtain a word pair;

s302: determining a related search statement set and a related document set corresponding to the search statement based on the historical click data;

s303: counting first click information of each word pair in a preset time period in the related search statement set, and counting second click information of each word pair in the preset time period in the related document set;

s304: determining a correction coefficient corresponding to a preset time period; wherein the correction coefficient is negatively correlated with a time span between a preset time period and the current time;

s305: obtaining the closeness of each word pair by fitting the first click information and the second click information;

s306: utilizing the correction coefficient to correct the compactness of each word pair;

since the document in the music field has high timeliness, the correction coefficient corresponding to the preset time end is introduced in the embodiment to correct the first click information and the second click information, and the accuracy of the tightness calculation is further improved. It can be understood that the closer the preset time period is to the current time, the larger the correction coefficient is, that is, the influence of the new data is enhanced and the influence of the past data is reduced, and the sensitivity of the popular words in the network to the compactness calculation is increased. As a possible implementation, an ibbingos forgetting curve may be employed:

contextcount_month=calTimeFactor(t)×context_count;

wherein t is a time span between a preset time period and the current time, caltimefactor (t) is caltimefactor (t) e ^ t (-60), the memory strength can be set to 60, and context _ count is the closeness of the first click information and the second click information after fitting.

S307: and searching based on the closeness of all the word pairs to obtain a search result corresponding to the search statement.

Therefore, the characteristics of the specific timeliness of the document in the music field and the clicking behavior of the user are comprehensively considered, the correction coefficient corresponding to the preset time end is introduced, the first click information and the second click information are corrected, the tightness calculation mode can be better fit with the tightness calculation of the music field, and the accuracy of the tightness calculation is improved.

Compared with the first embodiment, the embodiment further describes and optimizes the technical scheme, and introduces the server as an execution subject. Specifically, the method comprises the following steps:

referring to fig. 5, a flowchart of a fourth search method provided in the embodiment of the present application is shown in fig. 5, and includes:

s401: acquiring a search sentence, determining all participles in the search sentence, and connecting every two adjacent participles to obtain a word pair;

s402: determining a related search statement set and a related document set corresponding to the search statement based on the historical click data;

s403: counting first click information of each word pair in a preset time period in the related search statement set, and counting second click information of each word pair in the preset time period in the related document set;

s404: obtaining the closeness of each word pair by fitting the first click information and the second click information;

s405: calculating the left entropy of the first participle and the right entropy of the second participle in each word pair; the first participle is the first participle in the corresponding word pair, and the second participle is the last participle in the corresponding word pair;

in this step, for the word pair AB, the left entropy of the first participle a and the right entropy of the second participle B are calculated:

left entropy (a) ═ Σx ∈ all words appearing to the left of Ap(xA|A)log2p(xA|A);

Right entropy (B) ═ Σx ∈ all words appearing to the right of Ap(Ay|A)log2p(Ay|A);

The left entropy and the right entropy represent the probability of combining the phrase with other words, and the higher the left entropy and the right entropy, the higher the probability of combining the phrase with other words, the more likely the phrase is combined into a search word.

S406: judging whether the first participle and the second participle belong to the same named entity to obtain a first judgment result, and judging whether the left entropy and the right entropy are both larger than a preset value to obtain a second judgment result;

s407: and obtaining the compactness correction parameter of each word pair based on the first judgment result and the second judgment result, and correcting the compactness by using the compactness correction parameter.

S408: and searching the corrected compactness based on all the words to obtain a search result corresponding to the search sentence.

In a specific implementation, the compactness may be modified according to the following formula:

Adjoin_key(A,B)=Adjoin(A,B)×(1+e1×isNer(AB)+e2×isLowEntropy(A,B));

and the Adjoin _ key (A, B) is the closeness of the words to the AB after the words are corrected, and the Adjoin (A, B) is the closeness of the words to the AB after the first click information and the second click information are fitted. isner (ab) indicates whether the first participle a and the second participle B belong to the same named entity (i.e. the first determination result), is 1, or is-1. isLowEntrol indicates whether the left entropy of the first participle A and the right entropy of the second participle B are both greater than a preset value (i.e., a second determination result), and is 1 or-1. e1 is the weight of the first determination result, and e2 is the weight of the second determination result.

Therefore, the compactness is adjusted by utilizing the left-right entropy, and the accuracy of the compactness calculation is further improved.

Compared with the previous embodiments, the embodiment further explains and optimizes the technical scheme, and introduces the server as an execution subject.

Specifically, the method comprises the following steps:

referring to fig. 6, a flowchart of a fifth search method provided in the embodiment of the present application is shown in fig. 6, and includes:

s501: acquiring a search sentence, determining all participles in the search sentence, and connecting every two adjacent participles to obtain a word pair;

s502: determining a related search statement set and a related document set corresponding to the search statement based on the historical click data;

s503: assigning corresponding weights to the full context mode, the single context mode and the no context mode;

s504: counting click information of each word pair in a full context mode, a single context mode and a no context mode in a preset time period in a relevant search statement set to serve as candidate first click information;

s505: performing weighted fitting on all candidate first click information corresponding to each word pair based on the weight to obtain first click information;

s506: carrying out structural division on each document in the related document set according to the named entities to obtain structural data corresponding to each document;

s507: counting click information of each word pair in a full context mode, a single context mode and a no context mode in a preset time period based on the structured data to serve as candidate second click information;

s508: performing weighted fitting on all candidate second click information corresponding to each word pair based on the weight to obtain second click information;

s509: determining a correction coefficient corresponding to a preset time period; wherein the correction coefficient is negatively correlated with a time span between a preset time period and the current time;

s510: distributing a first weight to the first click information and distributing a second weight to the second click information;

s511: carrying out weighted fitting on the first click information and the second click information based on the first weight and the second weight to obtain the closeness of each word pair;

s512: utilizing the correction coefficient to correct the compactness of each word pair;

s513: calculating the left entropy of the first participle and the right entropy of the second participle in each word pair; the first participle is the first participle in the corresponding word pair, and the second participle is the last participle in the corresponding word pair;

s514: judging whether the first participle and the second participle belong to the same named entity to obtain a first judgment result, and judging whether the left entropy and the right entropy are both larger than a preset value to obtain a second judgment result;

s515: and obtaining the compactness correction parameter of each word pair based on the first judgment result and the second judgment result, and correcting the compactness by using the compactness correction parameter.

S516: and searching the corrected compactness based on all the words to obtain a search result corresponding to the search sentence.

Therefore, according to the embodiment, the click information of each word in the four modes of the relevant search statement set and the relevant document set is respectively counted, the first click information corresponding to the relevant search statement set and the second click information corresponding to the relevant document set are obtained by fitting the click information in the four modes, and the accuracy of the click information counting is improved. The characteristics of specific timeliness and novelty of the document in the music field and the clicking behavior of the user are comprehensively considered, and the compactness is adjusted by utilizing the left entropy and the right entropy, so that the compactness calculation mode can be better fit with the compactness calculation of the music field, and the accuracy of the compactness calculation is further improved.

In the following, a search apparatus provided by an embodiment of the present application is introduced, and a search apparatus described below and a search method described above may be referred to each other.

Referring to fig. 7, a structure diagram of a search apparatus according to an embodiment of the present application is shown in fig. 7, and includes:

an obtaining module 701, configured to obtain a search statement, determine all participles in the search statement, and connect every two adjacent participles to obtain a word pair;

a first determining module 702, configured to determine, based on historical click data, a related search statement set and a related document set corresponding to the search statement; the relevant document set comprises relevant documents of the search statement, the relevant documents are clicked documents in the recall result corresponding to the search statement, the relevant search statement set comprises relevant search statements of the search statement, and the search statement is the same as the relevant documents of the relevant search statement;

a counting module 703, configured to count first click information of each word pair in a preset time period in the relevant search statement set, and count second click information of each word pair in a preset time period in the relevant document set;

a fitting module 704, configured to obtain closeness of each word pair by fitting the first click information and the second click information;

the searching module 705 is configured to search based on the closeness of all the word pairs to obtain a search result corresponding to the search statement.

According to the searching device provided by the embodiment of the application, the related searching statement set and the related document set corresponding to the searching statement are determined based on the historical click data of the user, the click information of each word pair is respectively counted in the related searching statement set and the related document set, and therefore the closeness of each word pair is obtained through fitting. Since the historical click data of the user has the characteristic of feedback correction, the calculated compactness is high in accuracy. Therefore, the searching device provided by the embodiment of the application calculates the closeness based on the historical click data of the user, solves the technical problem of low closeness calculation accuracy in the related technology, and improves the searching accuracy.

On the basis of the foregoing embodiment, as a preferred implementation manner, the statistical module 703 includes:

a first statistical unit, configured to count click information of each word pair in a full context mode, a single context mode, and a no context mode within a preset time period in the relevant search statement set as candidate first click information, and obtain the first click information by fitting all the candidate first click information corresponding to each word pair;

and the second counting unit is used for counting click information of each word pair in a full context mode, a single context mode and a no context mode within a preset time period in the relevant document set as candidate second click information, and obtaining the second click information by fitting all the candidate second click information corresponding to each word pair.

On the basis of the foregoing embodiment, as a preferred implementation manner, the statistical module 703 further includes:

a first allocation unit, configured to allocate corresponding weights to the full context mode, the single context mode, and the no context mode;

correspondingly, the first statistical unit is specifically a unit that counts click information of each word pair in a full context mode, a single context mode and a no context mode within a preset time period in the relevant search statement set as candidate first click information, and performs weighted fitting on all the candidate first click information corresponding to each word pair based on the weight to obtain the first click information;

the second statistical unit is specifically a unit that counts click information of each word pair in a full context mode, a single context mode and a no context mode within a preset time period in the relevant document set as candidate second click information, and performs weighted fitting on all the candidate second click information corresponding to each word pair based on the weight to obtain the second click information.

On the basis of the foregoing embodiment, as a preferred implementation, the fitting module 704 includes:

the second distribution unit is used for distributing a first weight to the first click information and distributing a second weight to the second click information;

and the fitting unit is used for performing weighted fitting on the first click information and the second click information based on the first weight and the second weight to obtain the closeness of each word pair.

On the basis of the foregoing embodiment, as a preferred implementation manner, the statistical module 703 includes:

the third counting unit is used for counting the first click information of each word pair in a preset time period in the related search statement set;

the dividing unit is used for carrying out structural division on each document in the related document set according to the named entity to obtain structural data corresponding to each document;

and the fourth counting unit is used for counting second click information of each word pair in a preset time period based on the structured data.

On the basis of the above embodiment, as a preferred implementation, the method further includes:

the second determining module is used for determining a correction coefficient corresponding to the preset time period; wherein the correction factor is negatively correlated with the time span between the preset time period and the current time;

and the first correcting module is used for correcting the compactness of each word pair by using the correction coefficient.

On the basis of the above embodiment, as a preferred implementation, the method further includes:

the calculation module is used for calculating the left entropy of the first participle and the right entropy of the second participle in each word pair; the first participle is the first participle in the corresponding word pair, and the second participle is the last participle in the corresponding word pair;

the judgment module is used for judging whether the first participle and the second participle belong to the same named entity to obtain a first judgment result, and judging whether the left entropy and the right entropy are both larger than a preset value to obtain a second judgment result;

and the second correcting module is used for obtaining a compactness correcting parameter of each word pair based on the first judging result and the second judging result and correcting the compactness by using the compactness correcting parameter.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Those skilled in the art will appreciate that all or part of the steps in the method for implementing the above embodiments may be implemented by a program instructing the relevant hardware. For the hardware, the present application also provides a server, and referring to fig. 8, a structure diagram of a server 80 provided in the embodiment of the present application, as shown in fig. 8, may include a processor 81 and a memory 82.

Among other things, processor 81 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 81 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 81 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 81 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 81 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

The memory 82 may include one or more computer-readable storage media, which may be non-transitory. Memory 82 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 82 is at least used for storing a computer program 821, wherein after being loaded and executed by the processor 81, the computer program can realize relevant steps in the search method executed by the server side disclosed in any of the foregoing embodiments. In addition, the resources stored by the memory 82 may also include an operating system 822, data 823, and the like, and the storage may be transient storage or permanent storage. The operating system 822 may include Windows, Unix, Linux, etc.

In some embodiments, server 80 may also include a display screen 83, an input/output interface 84, a communication interface 85, sensors 86, a power supply 87, and a communication bus 88.

Of course, the structure of the server shown in fig. 8 does not constitute a limitation to the server in the embodiment of the present application, and in practical applications, the server may include more or less components than those shown in fig. 8, or some components may be combined.

In another exemplary embodiment, a computer readable storage medium is also provided, which includes program instructions, which when executed by a processor, implement the steps of the search method performed by the server of any of the above embodiments.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

21页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:视频特征提取方法及应用该方法的视频量化方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!