System and method for recognizing proper nouns

文档序号:1087397 发布日期:2020-10-20 浏览:8次 中文

阅读说明:本技术 识别专有名词的系统和方法 (System and method for recognizing proper nouns ) 是由 胡娟 陈欢 宋奇 马利 于 2019-04-04 设计创作,主要内容包括:本发明涉及识别专有名词的方法和系统,所述方法包括:获取包含目标查询词的检索请求;以及基于训练好的识别模型,确定所述目标查询词中是否包括至少一个专有名词,其中,所述训练好的识别模型被配置为提供所述目标查询词包括至少一个专有名词的概率;以及响应于确定所述概率大于预设的概率阈值,确定所述目标查询词包括至少一个专有名词。所述方法还可以包括至少基于专有名词列表,切分所述目标查询词,所述专有名词列表包括至少两个专有名词。所述方法还可以基于专有名词列表对与目标查询词相关的一个或以上感兴趣的词语的至少一部分进行排序。本发明可以基于精确的专有名词列表,自动且有效地识别查询中的专有名词或实现切分查询。(The invention relates to a method and a system for recognizing proper nouns, wherein the method comprises the following steps: acquiring a retrieval request containing a target query word; and determining whether the target query term includes at least one proper noun based on a trained recognition model, wherein the trained recognition model is configured to provide a probability that the target query term includes at least one proper noun; and in response to determining that the probability is greater than a preset probability threshold, determining that the target query term includes at least one proper noun. The method may further include segmenting the target query term based at least on a proper noun list, the proper noun list including at least two proper nouns. The method may also rank at least a portion of the one or more terms of interest related to the target query term based on the list of proper nouns. The invention can automatically and effectively identify proper nouns in the query or realize segmentation query based on the accurate proper noun list.)

1. A method for segmenting target query words, which is characterized by comprising the following steps:

acquiring a retrieval request containing a target query word; and

segmenting the target query term based on at least a proper noun list, wherein the proper noun list comprises at least two proper nouns, and the proper noun list is provided by:

obtaining at least two first historical retrieval records, wherein each of the at least two first historical retrieval records comprises a first query word from a first user and a first interesting word TOI selected by the first user as a first retrieval word of the first historical retrieval record, and at least one part of the first TOI comprises at least one proper noun;

obtaining a trained recognition model, wherein the trained recognition model is configured to provide a probability that a candidate TOI includes at least one proper noun; and

determining the list of proper nouns based on at least the trained recognition model and the at least two first historical retrieval records.

2. The method of claim 1, wherein the segmenting the target query term based at least on a proper noun list comprising at least two proper nouns comprises:

segmenting the target query word into two or more participles by using a segmentation technology, wherein the segmentation technology comprises an N-gram segmentation technology;

comparing the two or more participles to the list of proper nouns; and

combining at least two of the two or more tokens in response to determining that a combination of the at least two tokens is a proper noun in the list of proper nouns.

3. The method of any of claims 1-2, wherein the segmenting the target query term based at least on a proper noun list comprising at least two proper nouns comprises:

obtaining a segmentation model, wherein the segmentation model is configured to segment the target query term;

merging the list of proper nouns into the segmentation model by assigning weight coefficients to the list of proper nouns; and

and segmenting the target query word based on the combined segmentation model.

4. The method of any of claim 1, wherein determining the proper noun list based on at least the trained recognition model and the at least two first historical search records comprises:

retrieving, for each of the at least two first history records;

determining a first probability that the first TOI includes at least one proper noun; and

determining whether the first probability is greater than a first preset probability threshold; and

adding the proper noun to the list of proper nouns if the first probability is greater than the first preset probability threshold.

5. The method of any of claim 4, wherein the trained recognition model is determined by a training process comprising:

obtaining at least two second historical retrieval records, wherein each of the at least two second historical retrieval records comprises a second query word from a second user and a second TOI selected by the second user as a second retrieval word of the second historical retrieval record, and at least one part of the second TOI comprises at least one proper noun;

obtaining at least two training samples based on the at least two second historical retrieval records; and

training an initial recognition model based on the at least two training samples to generate the trained recognition model.

6. The method of claim 5, wherein obtaining at least two training samples based on the at least two second historical retrieval records comprises:

for each of the at least two second history retrieval records,

segmenting the second TOI in the second historical retrieval record into at least two second participles;

extracting feature information associated with the at least two second participles;

determining whether the characteristic information meets a first preset condition; and

if the characteristic information meets the first preset condition, designating the second historical retrieval record as a candidate retrieval record; and

determining the at least two training samples based on the candidate search records.

7. The method of claim 6, wherein the feature information of each of the at least two second participles comprises: at least one of a first consistency parameter associated with the second query term and the second participle, a second consistency parameter associated with the second TOI and the second participle, a degree of cohesion parameter associated with the second participle, a left entropy associated with the second participle, a right entropy associated with the second participle, a degree of freedom associated with the second participle, a probability that the second participle is at a beginning of the second TOI or an end of the second TOI, and a frequency associated with the second participle.

8. The method of claim 6, further comprising:

manually screening at least one candidate retrieval record from the candidate retrieval records; and

and determining the at least two training samples based on the screened candidate retrieval records.

9. The method according to any one of claims 6-8, wherein the training an initial recognition model to generate the trained recognition model based on the at least two training samples comprises:

determining at least two sample probabilities corresponding to the at least two training samples based on the initial recognition model and the feature information of each of the at least two training samples;

determining whether the at least two sample probabilities satisfy a second preset condition; and

designating the initial recognition model as the trained recognition model in response to the determination that the at least two sample probabilities satisfy the second preset condition.

10. The method of claim 1, wherein the trained recognition model comprises a support vector machine model.

11. A method of recognizing proper nouns, the method comprising:

acquiring a retrieval request containing a target query word; and

determining whether the target query term includes at least one proper noun based on a trained recognition model, wherein the trained recognition model is configured to provide a probability that the target query term includes at least one proper noun; and

in response to determining that the probability is greater than a preset probability threshold, determining that the target query term includes at least one proper noun.

12. The method of claim 11, wherein the trained recognition model is determined by a training process comprising:

obtaining at least two first historical search records, wherein each of the at least two first historical search records comprises a first query word from a first user and a first interesting word TOI selected by the first user as a first search word of the first historical search record, and at least a part of the first TOI comprises at least one proper noun;

acquiring at least two training samples based on the at least two first historical retrieval records;

training an initial recognition model based on the at least two training samples to generate the trained recognition model.

13. The method of claim 12, wherein obtaining at least two training samples based on the at least two first historical retrieval records comprises:

for each of the at least two first history retrieval records,

segmenting the first TOI in the first historical retrieval record into at least two participles;

extracting feature information associated with the at least two participles;

determining whether the characteristic information meets a first preset condition; and

if the characteristic information meets the first preset condition, designating the first historical retrieval record as a candidate retrieval record; and

determining the at least two training samples based on the candidate search records.

14. The method of claim 13, wherein the feature information for each of the at least two participles comprises: at least one of a first consistency parameter associated with the first query term and the participle, a second consistency parameter associated with the first TOI and the participle, a cohesion parameter associated with the participle, a left entropy associated with the participle, a right entropy associated with the participle, a degree of freedom associated with the participle, a probability that the participle is located at a beginning of the first TOI or an end of the first TOI, and a frequency associated with the participle.

15. The method of claim 13, further comprising:

manually screening at least one candidate retrieval record from the candidate retrieval records; and

and determining the at least two training samples based on the screened candidate retrieval records.

16. The method according to any one of claims 13-15, wherein the training an initial recognition model based on the at least two training samples to generate the trained recognition model comprises:

determining at least two sample probabilities corresponding to the at least two training samples based on the initial recognition model and the feature information of each of the at least two training samples;

determining whether the at least two sample probabilities satisfy a second preset condition; and

designating the initial recognition model as the trained recognition model in response to determining that the at least two sample probabilities satisfy the second preset condition.

17. The method according to any one of claims 11-15, wherein the trained recognition model comprises a support vector machine model.

18. The method of claim 11, further comprising:

in response to determining that the probability is greater than a preset probability threshold, segmenting the target query term based on the at least one proper noun.

19. A method for segmenting target query words, which is characterized by comprising the following steps:

acquiring a retrieval request containing a target query word;

determining one or more term of interest TOIs related to the target query term; and

ranking at least a portion of the one or more TOIs based on a proper noun list comprising at least two proper nouns, wherein the proper noun list is determined based on a trained recognition model and at least two historical search records.

20. A system for segmenting target query words is characterized by comprising a request acquisition module, a list determination module and a query segmentation module;

the request acquisition module is used for acquiring a retrieval request containing a target query word;

the list determination module is configured to determine a proper noun list, the proper noun list including at least two proper nouns, wherein the list determination module is configured to:

obtaining at least two first historical retrieval records, wherein each of the at least two first historical retrieval records comprises a first query word from a first user and a first interesting word TOI selected by the first user as a first retrieval word of the first historical retrieval record, and at least one part of the first TOI comprises at least one proper noun;

obtaining a trained recognition model, wherein the trained recognition model is configured to provide a probability that a candidate TOI includes at least one proper noun; and

determining a proper noun list at least based on the trained recognition model and the at least two first historical retrieval records; and

the query segmentation module is used for segmenting the target query term at least based on the proper noun list.

21. The system of claim 20, wherein the query splitting module is further configured to:

segmenting the target query word into two or more participles by using a segmentation technology, wherein the segmentation technology comprises an N-gram segmentation technology;

comparing the two or more participles to the list of proper nouns; and

combining at least two of the two or more tokens in response to determining that a combination of the at least two tokens is a proper noun in the list of proper nouns.

22. The system of any of claims 20-21, wherein the query splitting module is further configured to:

obtaining a segmentation model, wherein the segmentation model is configured to segment the target query term;

merging the list of proper nouns into the segmentation model by assigning weight coefficients to the list of proper nouns; and

and segmenting the target query word based on the combined segmentation model.

23. The system of any of claims 20-22, wherein the list determination module is further configured to:

retrieving, for each of the at least two first history records;

determining a first probability that the first TOI includes at least one proper noun; and

determining whether the first probability is greater than a first preset probability threshold;

adding the proper noun to the list of proper nouns if the first probability is greater than the first preset probability threshold.

24. The system of any one of claims 20-23, further comprising a model training module to:

obtaining at least two second historical retrieval records, wherein each of the at least two second historical retrieval records comprises a second query word from a second user and a second TOI selected by the second user as a second retrieval word of the second historical retrieval record, and at least one part of the second TOI comprises at least one proper noun;

obtaining at least two training samples based on the at least two second historical retrieval records; and training an initial recognition model based on the at least two training samples to generate the trained recognition model.

25. The system of claim 24, wherein the model training module is further configured to:

for each of the at least two second history retrieval records,

segmenting the second TOI in the second historical retrieval record into at least two second participles;

extracting feature information associated with the at least two second participles;

determining whether the characteristic information meets a first preset condition; and

if the characteristic information meets the first preset condition, designating the second historical retrieval record as a candidate retrieval record; and

determining the at least two training samples based on the candidate search records.

26. The system of claim 25, wherein the feature information for each of the at least two second participles comprises: at least one of a first consistency parameter associated with the second query term and the second participle, a second consistency parameter associated with the second TOI and the second participle, an agglomerative degree parameter associated with the second participle, a left entropy associated with the second participle, a right entropy associated with the second participle, a degree of freedom associated with the second participle, a probability that the second participle is located at a beginning of the second TOI or an end of the second TOI, and a frequency associated with the second participle.

27. The system of claim 25, wherein the model training module is further configured to:

manually screening at least one candidate retrieval record from the candidate retrieval records; and

and determining the at least two training samples based on the screened candidate retrieval records.

28. The system of any of claims 25-27, wherein the model training module is further configured to:

determining at least two sample probabilities corresponding to the at least two training samples based on the initial recognition model and the feature information of each of the at least two training samples;

determining whether the at least two sample probabilities satisfy a second preset condition; and

designating the initial recognition model as the trained recognition model in response to determining that the at least two sample probabilities satisfy the second preset condition.

29. The system according to any one of claims 20-28, wherein the trained recognition model comprises a support vector machine model.

30. A system for recognizing proper nouns, comprising: the system comprises a request acquisition module and a proper noun identification module;

the request acquisition module is used for acquiring a retrieval request containing a target query word;

the proper noun recognition module is used for determining whether the target query word comprises at least one proper noun or not based on a trained recognition model, wherein the trained recognition model is configured to provide the probability that the target query word comprises at least one proper noun; and

the proper noun recognition module is used for responding to the fact that the probability is larger than a preset probability threshold value, and determining that the target query word comprises at least one proper noun.

31. The system of claim 30, further comprising a model training module, the model training module further configured to:

obtaining at least two first historical retrieval records, wherein each of the at least two first historical retrieval records comprises a first query word from a first user and a first TOI selected by the first user as a first retrieval word of the first historical retrieval record, and at least one part of the first TOI comprises at least one first proper noun;

acquiring at least two training samples based on the at least two first historical retrieval records; and

training an initial recognition model based on the at least two training samples to generate the trained recognition model.

32. The system of claim 31, wherein the model training module is further configured to:

for each of the at least two first history retrieval records,

segmenting the first TOI in the first historical retrieval record into at least two participles;

extracting feature information associated with the at least two participles;

determining whether the characteristic information meets a first preset condition; and

if the characteristic information meets a first preset condition, designating the first historical retrieval record as a candidate retrieval record; and

determining the at least two training samples based on the candidate search records.

33. The system of claim 32, wherein the feature information for each of the at least two participles comprises: at least one of a first consistency parameter associated with the first query term and the participle, a second consistency parameter associated with the first TOI and the participle, a cohesion parameter associated with the participle, a left entropy associated with the participle, a right entropy associated with the participle, a degree of freedom associated with the participle, a probability that the participle is located at a beginning of the first TOI or an end of the first TOI, and a frequency associated with the participle.

34. The system of claim 32, wherein the model training module is further configured to:

manually screening at least one candidate retrieval record from the candidate retrieval records; and

and determining the at least two training samples based on the screened candidate retrieval records.

35. The system of any of claims 32-34, wherein the model training module is further configured to:

determining at least two sample probabilities corresponding to the at least two training samples based on the initial recognition model and the feature information of each of the at least two training samples;

determining whether the at least two sample probabilities satisfy a first preset condition; and

designating the initial recognition model as the trained recognition model in response to determining that the at least two sample probabilities satisfy the first preset condition.

36. The system according to any one of claims 30-35, wherein the trained recognition model comprises a support vector machine model.

37. The system of any one of claims 30-36, wherein the proper noun identification module is further configured to:

the query segmentation module is configured to segment the target query term based on the at least one proper noun in response to the determination that the probability is greater than a preset probability threshold.

38. A system for segmenting target query words is characterized by comprising a request acquisition module and a TOI determination module;

the request acquisition module is used for acquiring a retrieval request containing a target query word;

the TOI determination module is used for determining one or more TOIs related to the target query word; and

the TOI determination module is configured to rank at least a portion of the one or more TOIs based on a proper noun list comprising at least two proper nouns, wherein the proper noun list is determined based on a trained recognition model and at least two historical search records.

39. An apparatus for recognizing proper nouns, comprising at least one storage medium and at least one processor;

the at least one storage medium is configured to store computer instructions;

the at least one processor is configured to execute the computer instructions to implement the method of any of claims 1-19.

40. A computer-readable storage medium storing computer instructions which, when executed by at least one processor, implement the method of any one of claims 1-19.

Technical Field

The present application relates generally to systems and methods for identifying proper nouns, and, in particular, to systems and methods for segmenting queries based on proper nouns.

Background

Currently, when a user initiates a search request by inputting a query (e.g., a query that the user wants to search) through a user terminal, a system providing a search service, after receiving the search request, partitions the query, determines one or more terms of Interest (TOI) associated with the query based on the partition result, and recommends at least a portion of the terms of Interest to the user terminal. The method may segment proper nouns contained in the query, making it impossible for the system to identify proper nouns (e.g., names of people, places, names of organizations) contained in the query. For example, if a user wants to search for new energy Germany technology Limited (company name), the system providing search services may segment the new energy Germany technology Limited into, for example, New energy, Germany, technology Limited, etc. Therefore, the system cannot recognize the proper noun "new energy science limited. To solve the problem, conventionally, a proper noun recognition method is performed by training a sequence tagging model using context information of the sequence tagging. However, this conventional approach may not be ideal because certain proper nouns appear less frequently in a travel scenario and the queries and TOI strings entered by the user are short, resulting in insufficient contextual information. It is therefore desirable to provide new systems and methods for automatically and efficiently identifying proper nouns in a query or segmenting a query based on an accurate list of proper nouns.

Disclosure of Invention

In view of the above-mentioned problems of insufficient context information and unsatisfactory effect caused by low occurrence frequency of some proper nouns in the travel scene and short query and TOI strings inputted by the user, one of the objectives of the present invention is to provide a system and a method for segmenting target query words, so as to automatically and effectively identify proper nouns in the query or implement segmented query based on an accurate proper noun list. In order to achieve the purpose, the technical scheme provided by the application is as follows:

one embodiment of the present application provides a system for segmenting a target query term. The system may include: at least one storage medium comprising a set of instructions; and at least one processor in communication with the at least one storage medium, wherein the set of instructions, when executed, is configured to: acquiring a retrieval request containing a target query word; and segmenting the target query term based on at least a proper noun list, wherein the proper noun list comprises at least two proper nouns, and the proper noun list is provided by: obtaining at least two first historical retrieval records, wherein each of the at least two first historical retrieval records comprises a first query word from a first user and a first interesting word TOI selected by the first user as a first retrieval word of the first historical retrieval record, and at least one part of the first TOI comprises at least one proper noun; obtaining a trained recognition model, wherein the trained recognition model is configured to provide a probability that a candidate TOI includes at least one proper noun; and determining the proper noun list at least based on the trained recognition model and the at least two first historical retrieval records.

In some embodiments, to split the target query term based at least on a proper noun list comprising at least two proper nouns, the at least one processor may be operative to: segmenting the target query word into two or more than two segmented words by using a segmentation technology, wherein the segmentation technology comprises an N-gram segmentation technology; comparing the two or more participles to the list of proper nouns; and combining at least two of the two or more tokens in response to determining that a combination of the at least two tokens is a proper noun in the list of proper nouns.

In some embodiments, to split the target query term based at least on a proper noun list comprising at least two proper nouns, the at least one processor may be operative to: obtaining a segmentation model, wherein the segmentation model is configured to segment the target query term; merging the list of proper nouns into the segmentation model by assigning weight coefficients to the list of proper nouns; and segmenting the target query term based on the combined segmentation model.

In some embodiments, the determining the proper noun list based on at least the trained recognition model and the at least two first historical retrieval records may include: for each of the at least two first historical search records, determining a first probability that a term of the first TOI includes at least one proper noun; and determining whether the first probability is greater than a first preset probability threshold; and adding the proper noun to the list of proper nouns if the first probability is greater than the first preset probability threshold.

In some embodiments, the trained recognition model is determined by a training process, which may include: obtaining at least two second historical retrieval records, wherein each of the at least two second historical retrieval records comprises a second query word from a second user and a second TOI selected by the second user as a second retrieval word of the second historical retrieval record, and at least one part of the second TOI comprises at least one proper noun; obtaining at least two training samples based on the at least two second historical retrieval records; training an initial recognition model based on the at least two training samples to generate the trained recognition model.

In some embodiments, the obtaining at least two training samples based on the at least two second historical retrieval records may include: for each of the at least two second historical search records, segmenting the second TOI in the second historical search record into at least two second participles; extracting feature information associated with the at least two second participles; determining whether the characteristic information meets a first preset condition; if the characteristic information meets the first preset condition, the second historical retrieval record is designated as a candidate retrieval record; and determining the at least two training samples based on the candidate retrieval record.

In some embodiments, the feature information of each of the at least two second participles may include: at least one of a first consistency parameter associated with the second query term and the second participle, a second consistency parameter associated with the second TOI and the second participle, an agglomerative degree parameter associated with the second participle, a left entropy associated with the second participle, a right entropy associated with the second participle, a degree of freedom associated with the second participle, a probability that the second participle is located at a beginning of the second TOI or an end of the second TOI, and a frequency associated with the second participle.

In some embodiments, the system may further comprise: manually screening at least one candidate retrieval record from the candidate retrieval records; and determining the at least two training samples based on the screened candidate retrieval records.

In some embodiments, training an initial recognition model based on the at least two training samples to generate the trained recognition model may include: determining at least two sample probabilities corresponding to the at least two training samples based on the initial recognition model and the feature information of each of the at least two training samples; determining whether the at least two sample probabilities satisfy a second preset condition; and designating the initial recognition model as the trained recognition model in response to determining that the at least two sample probabilities satisfy the second preset condition.

In some embodiments, the trained recognition model may comprise a support vector machine model.

One of the embodiments of the present application provides a system for segmenting a target query term, where the system may include a request obtaining module, a list determining module, and a query segmenting module. The request acquisition module can be used for acquiring a retrieval request containing a target query word; the list determining module may be configured to obtain at least two first historical search records, where each of the at least two first historical search records includes a first query term from a first user, and a first word of interest TOI selected by the first user as a first search term of the first historical search record, and at least a portion of the first word of interest TOI includes at least one proper noun; the list determination module may be configured to obtain a trained recognition model, wherein the trained recognition model is configured to provide a probability that a candidate TOI includes at least one proper noun; and the list determination module may be configured to determine a list of proper nouns based on at least the trained recognition model and the at least two first historical retrieval records. And the query segmentation module may be configured to segment the target query term based at least on the list of proper nouns.

One of the embodiments of the present application provides a system for recognizing proper nouns included in target query words, where the system may include a request acquisition module and a proper noun recognition module. The request acquisition module can be used for acquiring a retrieval request containing a target query word; the proper noun recognition module may be configured to determine whether the target query word includes at least one proper noun based on a trained recognition model, wherein the trained recognition model is configured to provide a probability that the target query word includes at least one proper noun; and the proper noun identification module may be configured to determine that the target query term includes at least one proper noun in response to determining that the probability is greater than a preset probability threshold.

One of the embodiments of the present application provides a system for segmenting a target query term, where the system may include a request obtaining module and a TOI determining module. The request acquisition module can be used for acquiring a retrieval request containing a target query word; the TOI determination module may be configured to determine one or more TOIs related to the target query term; and the TOI determination module may be configured to rank at least a portion of the one or more TOIs based on a proper noun list comprising at least two proper nouns. Wherein the list of proper nouns is determined based on the trained recognition model and at least two historical search records.

One of embodiments of the present application provides a method for segmenting a target query term, where the method may include: acquiring a retrieval request containing a target query word; and segmenting the target query term based at least on a proper noun list, the proper noun list comprising at least two proper nouns, wherein the list of proper nouns is provided by: obtaining at least two first historical retrieval records, wherein each of the at least two first historical retrieval records comprises a first query word from a first user and a first interesting word TOI selected by the first user as a first retrieval word of the first historical retrieval record, and at least one part of the first TOI comprises at least one proper noun; obtaining a trained recognition model, wherein the trained recognition model is configured to provide a probability that the first TOI includes at least one proper noun; and determining the proper noun list at least based on the trained recognition model and the at least two first historical retrieval records.

One of the embodiments of the present application provides a method for identifying a proper noun included in a target query term, where the method may include: acquiring a retrieval request containing a target query word; and determining whether the target query term includes at least one proper noun based on a trained recognition model, wherein the trained recognition model is configured to provide a probability that the target query term includes at least one proper noun; and in response to determining that the probability is greater than a preset probability threshold, determining that the target query term includes at least one proper noun.

One of embodiments of the present application provides a method for segmenting a target query term, where the method may include: acquiring a retrieval request containing a target query word; determining one or more TOIs related to the target query term; and ordering at least a portion of the one or more TOIs based on a proper noun list comprising at least two proper nouns. Wherein the list of proper nouns is determined based on the trained recognition model and at least two historical search records.

One of the embodiments of the present application provides an apparatus for segmenting a target query term, where the apparatus includes at least one storage medium and at least one processor; the at least one storage medium is configured to store computer instructions; the at least one processor is configured to execute the computer instructions to implement a method of segmenting target query terms. The method comprises the following steps: acquiring a retrieval request containing a target query word; and segmenting the target query term based at least on a proper noun list, the proper noun list comprising at least two proper nouns, wherein the list of proper nouns is provided by: obtaining at least two first historical retrieval records, wherein each of the at least two first historical retrieval records comprises a first query word from a first user and a first interesting word TOI selected by the first user as a first retrieval word of the first historical retrieval record, and at least one part of the first TOI comprises at least one proper noun; obtaining a trained recognition model, wherein the trained recognition model is configured to provide a probability that the first TOI includes at least one proper noun; and determining the proper noun list at least based on the trained recognition model and the at least two first historical retrieval records.

One of the embodiments of the present application provides an apparatus for recognizing proper nouns included in a target query term, the apparatus including at least one storage medium and at least one processor; the at least one storage medium is configured to store computer instructions; the at least one processor is configured to execute the computer instructions to implement a method of identifying proper nouns contained in a target query term. The method comprises the following steps: acquiring a retrieval request containing a target query word; and determining whether the target query term includes at least one proper noun based on a trained recognition model, wherein the trained recognition model is configured to provide a probability that the target query term includes at least one proper noun; and in response to determining that the probability is greater than a preset probability threshold, determining that the target query term includes at least one proper noun.

One of the embodiments of the present application provides an apparatus for segmenting a target query term, where the apparatus includes at least one storage medium and at least one processor; the at least one storage medium is configured to store computer instructions; the at least one processor is configured to execute the computer instructions to implement a method of segmenting target query terms. The method comprises the following steps: acquiring a retrieval request containing a target query word; determining one or more TOIs related to the target query term; and ordering at least a portion of the one or more TOIs based on a proper noun list comprising at least two proper nouns. Wherein the list of proper nouns is determined based on the trained recognition model and at least two historical search records.

One of the embodiments of the present application provides a computer-readable storage medium storing computer instructions, which when executed by at least one processor, implement a method for segmenting target query terms. The method comprises the following steps: acquiring a retrieval request containing a target query word; and segmenting the target query term based at least on a proper noun list, the proper noun list comprising at least two proper nouns, wherein the list of proper nouns is provided by: obtaining at least two first historical retrieval records, wherein each of the at least two first historical retrieval records comprises a first query word from a first user and a first interesting word TOI selected by the first user as a first retrieval word of the first historical retrieval record, and at least one part of the first TOI comprises at least one proper noun; obtaining a trained recognition model, wherein the trained recognition model is configured to provide a probability that the first TOI includes at least one proper noun; and determining the proper noun list at least based on the trained recognition model and the at least two first historical retrieval records.

One of the embodiments of the present application provides a computer-readable storage medium storing computer instructions that, when executed by at least one processor, implement a method of identifying proper nouns included in a target query term. The method comprises the following steps: acquiring a retrieval request containing a target query word; and determining whether the target query term includes at least one proper noun based on a trained recognition model, wherein the trained recognition model is configured to provide a probability that the target query term includes at least one proper noun; and in response to determining that the probability is greater than a preset probability threshold, determining that the target query term includes at least one proper noun.

One of the embodiments of the present application provides a computer-readable storage medium storing computer instructions, which when executed by at least one processor, implement a method for segmenting target query terms. The method comprises the following steps: acquiring a retrieval request containing a target query word; determining one or more TOIs related to the target query term; and ordering at least a portion of the one or more TOIs based on a proper noun list comprising at least two proper nouns. Wherein the list of proper nouns is determined based on the trained recognition model and at least two historical search records.

Additional features of the present application will be set forth in part in the description which follows. Additional features of some aspects of the present application will be apparent to those of ordinary skill in the art in view of the following description and accompanying drawings, or in view of the production or operation of the embodiments. The features of the present application may be realized and attained by practice or use of the methods, instrumentalities and combinations of the various aspects of the specific embodiments described below.

Drawings

The present application will be further described by way of exemplary embodiments. These exemplary embodiments will be described in detail by means of the accompanying drawings. These embodiments are not intended to be limiting, and like reference numerals refer to like parts throughout, wherein:

FIG. 1 is a schematic diagram of an exemplary proper noun recognition system shown in accordance with some embodiments of the present application;

FIG. 2 is a schematic diagram of exemplary hardware and/or software components of a computing device, shown in accordance with some embodiments of the present application;

FIG. 3 is a schematic diagram of exemplary hardware and/or software components of a mobile device shown in accordance with some embodiments of the present application;

FIG. 4 is a block diagram of an exemplary processing engine shown in accordance with some embodiments of the present application;

FIG. 5 is a flow diagram illustrating an exemplary process for segmenting target query terms based at least on a proper noun list including at least two proper nouns, according to some embodiments of the present application;

FIG. 6 is a flow diagram illustrating an exemplary process for determining a proper noun list including at least two proper nouns according to some embodiments of the present application.

FIG. 7 is a flow diagram illustrating an exemplary process for determining a trained recognition model according to some embodiments of the present application.

FIG. 8 is a flow diagram illustrating an exemplary process for identifying at least one proper noun included in a target query term in accordance with some embodiments of the present application; and

FIG. 9 is a flow diagram illustrating an exemplary process for ordering TOIs based on a proper noun list including at least two proper nouns, according to some embodiments of the present application.

Detailed Description

The following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a particular application and its requirements. It will be apparent to those of ordinary skill in the art that various changes can be made to the disclosed embodiments and that the general principles defined in this application can be applied to other embodiments and applications without departing from the principles and scope of the application. Thus, the present application is not limited to the described embodiments, but should be accorded the widest scope consistent with the claims.

The terminology used in the description presented herein is for the purpose of describing particular example embodiments only and is not intended to limit the scope of the present application. As used herein, the singular forms "a", "an" and "the" may include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, components, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, components, and/or groups thereof.

These and other features, aspects, and advantages of the present application, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description of the accompanying drawings, all of which form a part of this specification. It is to be understood, however, that the drawings are designed solely for the purposes of illustration and description and are not intended as a definition of the limits of the application. It should be understood that the drawings are not to scale.

Flow charts are used herein to illustrate operations performed by systems according to some embodiments of the present application. It should be understood that the operations in the flow diagrams may be performed out of order. Rather, various steps may be processed in reverse order or simultaneously. Further, one or more other operations may be added to the flowchart. One or more operations may also be deleted from the flowchart.

The words "requestor," "service requestor," "search requestor," and "search requestor" are used interchangeably in this application to refer to an individual, entity, or tool that can request or subscribe to a service. The word "user" in this application may refer to an individual, entity, or tool that may request a service, subscribe to a service, provide a service, or facilitate providing a service. In this application, the words "requestor" and "requestor terminal" may be used interchangeably.

The words "request," "service request," and "search request" are used interchangeably herein to refer to a requirement that may originate from a requestor, a service requestor, a search requestor, etc., or any combination thereof. The retrieval request or service request may be charged or free of charge.

The present application is directed to providing a more efficient system and method for identifying proper nouns in an input query than a conventional query. These proper terms are not used often, or appear infrequently in short query and TOI strings, resulting in insufficient contextual information.

One aspect of the present application relates to a system that identifies at least one proper noun included in a target query term associated with a search request. The system may identify at least one proper noun based on the trained recognition model. The trained recognition model may be configured to determine a probability that a term (e.g., a target query term) includes at least one proper noun. In response to determining that the probability is greater than a preset threshold, the system may determine that the target query term may include at least one proper noun. In the present application, the trained recognition model may be trained based on at least two first historical retrieval records.

In addition, the system may determine a proper noun list including at least two proper nouns offline based on the trained recognition model and the at least two second history search records. The system may determine the list of proper nouns based on at least two probabilities determined by the trained recognition model, the at least two probabilities corresponding to the at least two second historical retrieval records. Further, the system may use a list of proper nouns to segment the target query term based on a segmentation technique or a segmentation model.

FIG. 1 is a schematic diagram of an exemplary proper noun recognition system shown in accordance with some embodiments of the present application. For example, proper noun recognition system 100 may be an online service retrieval platform, such as a transportation service, an online shopping service, a map (e.g., google map, hectogram map, Tencent map) navigation service, a meal ordering service, and so forth. The proper noun recognition system 100 may include a server 110, a network 120, a user terminal 130, and a memory 140.

In some embodiments, the server 110 may be a single server or a group of servers. The set of servers can be centralized or distributed (e.g., the servers 110 can be a distributed system). In some embodiments, the server 110 may be local or remote. For example, server 110 may access information and/or data stored in user terminal 130 or memory 140 via network 120. As another example, server 110 may be directly connected to user terminal 130 and/or memory 140 to access stored information and/or data. In some embodiments, the server 110 may be implemented on a cloud platform. By way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an internal cloud, a multi-tiered cloud, and the like, or any combination thereof. In some embodiments, server 110 may be implemented on computing device 200 shown in FIG. 2 with one or more components.

In some embodiments, the server 110 may include a processing engine 112. Processing engine 112 may process information and/or data related to the retrieval request to perform one or more functions described herein. For example, processing engine 112 may segment the target query term based at least on a proper noun list including at least two proper nouns. The processing engine 112 may include one or more processing engines (e.g., a single chip processing engine or a multi-chip processing engine). The processing engine 112 may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), an application specific instruction set processor (ASIP), a Graphics Processing Unit (GPU), a Physical Processing Unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a microcontroller unit, a Reduced Instruction Set Computer (RISC), a microprocessor, or the like, or any combination thereof.

Network 120 may facilitate the exchange of information and/or data. In some embodiments, one or more components of the proper noun recognition system 100 (e.g., the server 110, the user terminal 130, or the memory 140) may send information and/or data to other components of the proper noun recognition system 100 via the network 120. For example, the server 110 may obtain a retrieval request from the user terminal 130 via the network 120. In some embodiments, the network 120 may be a wired network or a wireless network, or the like, or any combination thereof. By way of example only, network 120 may include a cable network, a wired network, a fiber optic network, a telecommunications network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Public Switched Telephone Network (PSTN), a bluetooth network, a ZigBee network, a Near Field Communication (NFC) network, the like, or any combination thereof. In some embodiments, network 120 may include one or more network access points. For example, network 120 may include wired or wireless network access points, such as base stations and/or Internet switching points 120-1, 120-2, … …, through which one or more components of proper noun recognition system 100 may connect to network 120 to exchange data and/or information.

In some embodiments, the retrieval requester may be a user of the user terminal 130. In some embodiments, the user of the user terminal 130 may be a person other than the retrieval requester. For example, user a of user terminal 130 may use user terminal 130 to send a retrieval request to user B or to receive a retrieval confirmation and/or information or instructions from server 110.

In some embodiments, the user terminal 130 may include a mobile device 130-1, a tablet computer 130-2, a laptop computer 130-3, an in-vehicle device 130-4, the like, or any combination thereof. In some embodiments, the mobile device 130-1 may include a smart home device, a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home devices may include smart lighting devices, smart appliance control devices, smart monitoring devices, smart televisions, smart cameras, interphones, and the like, or any combination thereof. In some embodiments, the wearable device may include a smart bracelet, smart footwear, smart glasses, smart helmet, smart watch, smart clothing, smart backpack, smart accessory, or the like, or any combination thereof. In some embodiments, the smart mobile device may include a smart phone, a Personal Digital Assistant (PDA), a gaming device, a navigation device, a point of sale (POS), etc., or any combination thereof. In some embodiments, the virtual reality device and/or the enhanced virtual reality device may include a virtual reality helmet, virtual reality glasses, virtual reality eyecups, augmented reality helmets, augmented reality glasses, augmented reality eyecups, and the like, or any combination thereof. For example, the virtual reality device and/or augmented reality device may include a Google GlassTM、Oculus RiftTM、HololensTMOr Gear VRTMAnd the like. In some embodiments, the in-vehicle device 130-4 may include an in-vehicle computer, an in-vehicle television, or the like. In some embodiments, the user terminal 130 may be a device having a location technology for locating a user (e.g., driver) location of the user terminal 130.

Memory 140 may store data and/or instructions related to retrieval requests. In some embodiments, memory 140 may store data retrieved from user terminal 130. In some embodiments, memory 140 may store data and/or instructions used by server 110 to perform or use to perform the exemplary methods described in this application. In some embodiments, memory 140 may include mass storage, removable storage, volatile read-write memory, read-only memory (ROM), and the like, or any combination thereof. Exemplary mass storage devices may include magnetic disks, optical disks, solid state disks, and the like. Exemplary removable memory may include flash drives, floppy disks, optical disks, memory cards, compact disks, magnetic tape, and the like. Exemplary volatile read and write memories can include Random Access Memory (RAM). Exemplary RAM may include Dynamic Random Access Memory (DRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), Static Random Access Memory (SRAM), thyristor random access memory (T-RAM), and zero capacitance random access memory (Z-RAM), among others. Exemplary read-only memories may include mask read-only memory (MROM), programmable read-only memory (PROM), erasable programmable read-only memory (perrom), electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), digital versatile disc read-only memory, and the like. In some embodiments, memory 140 may be implemented on a cloud platform. By way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an internal cloud, a multi-tiered cloud, and the like, or any combination thereof.

In some embodiments, memory 140 may be connected to network 120 to communicate with one or more components of proper noun recognition system 100 (e.g., server 110, user terminal 130). One or more components of the proper noun recognition system 100 may access data and/or instructions stored in the memory 140 via the network 120. In some embodiments, the memory 140 may be directly connected to or in communication with one or more components of the proper noun recognition system 100 (e.g., the server 110, the user terminal 130). In some embodiments, memory 140 may be part of server 110.

In some embodiments, one or more components of proper noun recognition system 100 (e.g., server 110, user terminal 130) may have permission to access memory 140. In some embodiments, one or more components of proper noun recognition system 100 may read and/or modify information related to a search requester and/or the public when one or more conditions are satisfied. For example, server 110 may read and/or modify information of one or more search requesters after the search service is completed.

In some embodiments, the exchange of information by one or more components of the proper noun recognition system 100 may be accomplished by requesting a retrieval service. The object of the retrieval request may be any product. In some embodiments, the product may be a tangible product or a non-physical product. Tangible products may include food, pharmaceuticals, commodities, chemical products, appliances, clothing, automobiles, homes, luxury goods, and the like, or any combination thereof. The non-material products may include service products, financial products, knowledge products, internet products, and the like, or any combination thereof. The internet products may include personal host products, website products, mobile internet products, commercial host products, embedded products, and the like, or any combination of the above. The mobile internet product may be software for a mobile terminal, a program, a system, etc. or any combination of the above. The mobile terminal may include a tablet computer, laptop computer, mobile phone, Personal Digital Assistant (PDA), smart watch, POS device, vehicle computer, vehicle television, wearable device, and the like, or any combination thereof. For example, the product may be any software and/or application used on a computer or mobile phone. The software and/or applications may relate to social interaction, shopping, transportation, entertainment, learning, investment, etc., or any combination of the above. In some embodiments, the transportation-related system software and/or applications may include travel software and/or applications, vehicle scheduling software and/or applications, mapping software and/or applications, and/or the like. In the vehicle scheduling software and/or application, the vehicle may include horses, human powered vehicles (e.g., wheelbarrows, bicycles, tricycles), automobiles (e.g., taxis, buses, private cars), trains, subways, ships, aircraft (e.g., airplanes, helicopters, space shuttles, rockets, hot air balloons), and any combination thereof.

It will be understood by those of ordinary skill in the art that when an element (or component) of the proper noun recognition system 100 executes, the element may execute via electrical and/or electromagnetic signals. For example, when user terminal 130 sends a retrieval request to server 110, the processor of user terminal 130 may generate an electrical signal encoding the retrieval request. The processor of the user terminal 130 may then send the electrical signal to the output port. If the user terminal 130 is requested to communicate with the server 110 via a wired network, the output port may be physically connected to a cable, which further transmits the electrical signal to the input port of the server 110. If user terminal 130 communicates with server 110 via a wireless network, the output port of user terminal 130 may be one or more antennas that convert electrical signals to electromagnetic signals. Within an electronic device, such as user terminal 130 and/or server 110, when its processor processes instructions, issues instructions, and/or performs actions, the instructions and/or actions are performed via electrical signals. For example, when the processor retrieves or stores data from a storage medium (e.g., memory 140), it may send electrical signals to a read/write device of the storage medium, which may read or write structured data in the storage medium. The structured data may be transmitted in the form of electrical signals to the processor via a bus of the electronic device. Herein, an electrical signal may refer to one electrical signal, a series of electrical signals, and/or at least two discrete electrical signals.

Fig. 2 is a schematic diagram of exemplary hardware and/or software components of a computing device 200 shown in accordance with some embodiments of the present application. In some embodiments, server 110 and/or user terminal 130 may be implemented on computing device 200. For example, the processing engine 112 may implement and perform the functions of the processing engine 112 disclosed herein on the computing device 200.

Computing device 200 may be used to implement any of the components of proper noun recognition system 100 as described herein. For example, the processing engine 112 may be implemented on the computing device 200 by its hardware, software programs, firmware, or a combination thereof. For convenience, only one computer is shown, but the computer functions described herein in connection with retrieving services may be implemented in a distributed fashion across multiple similar platforms to share processing load.

Computing device 200 may include a network connectivity communication port 250 to enable data communication. Computing device 200 may also include a processor 220 that may execute program instructions in the form of one or more processors (e.g., logic circuits). For example, the processor 220 may include interface circuitry and processing circuitry therein. Interface circuitry may be configured to receive electrical signals from bus 210, where the electrical signals encode structured data and/or instructions for the processing circuitry. The processing circuitry may perform logical computations and then determine the conclusion, result, and/or instruction encoding as electrical signals. The interface circuit may then send the electrical signals from the processing circuit via bus 210.

Computing device 200 may also include different forms of program storage and data storage, such as a disk 270, Read Only Memory (ROM)230, or Random Access Memory (RAM)240 for storing various data files processed and/or transmitted by the computing device. The exemplary computer platform may also include program instructions stored in ROM 230, RAM 240, and/or other types of non-transitory storage media for execution by processor 220. The methods and/or processes of the present application may be embodied in the form of program instructions. Computing device 200 also includes input/output (I/O)260 to support input/output between the computer and other components. Computing device 200 may also receive programming and data via network communications.

For ease of illustration, only one processor is depicted in FIG. 2. At least two processors may be included, such that operations and/or method steps described in this application as being performed by one processor may also be performed by multiple processors, collectively or individually. For example, if in the present application, the processors of computing device 200 perform steps a and B, it should be understood that steps a and B may also be performed by two different CPUs and/or processors of computing device 200, either collectively or independently (e.g., a first processor performing step a, a second processor performing step B, or a first and second processor collectively performing steps a and B).

FIG. 3 is a schematic diagram of exemplary hardware and/or software components of a mobile device shown in accordance with some embodiments of the present application. The user terminal 130 may be implemented on the mobile device 300. As shown in fig. 3, mobile device 300 may include a communication platform 310, a display 320, a Graphics Processing Unit (GPU)330, a Central Processing Unit (CPU)340, an input/output (I/O)350, a memory 360, a mobile Operating System (OS)370, and a storage 390. In some embodiments, any other suitable component, including but not limited to a system bus or a controller (not shown), may also be included in mobile device 300.

In some embodiments, an operating system 370 (e.g., iOS)TM、AndroidTM、Windows PhoneTMEtc.) and one or more applications 380 may be downloaded from storage 390 to memory 360 and executed by CPU 340. The application 380 may include a browser or any other suitable mobile application for receiving and presenting information related to a search service or other information from the proper noun recognition system 100. User interaction with the information flow may be accomplished via input/output units (I/O)350 and provided to processing engine 112 and/or other components of proper noun recognition system 100 via network 120.

FIG. 4 is a block diagram of an exemplary processing engine shown in accordance with some embodiments of the present application. The processing engine 112 may include a request acquisition module 410, a query segmentation module 420, a list determination module 430, a model training module 440, a proper noun recognition module 450, and a TOI determination module 460.

The request retrieval module 410 may be configured to retrieve a search request that includes a target query term associated with a user. In some embodiments, the search request may refer to a search request for searching for the target query term. By way of example only, the retrieval request may include a transportation retrieval request, an online shopping retrieval request, a map (e.g., google map, hundredth map, Tencent map) navigation retrieval request, an order retrieval request, and the like. In some embodiments, the target query term may refer to a term that the user wants to retrieve. By way of example only, the types of target query terms may include terms (e.g., address) associated with a transportation search request or a map (e.g., google map, Baidu map, Tencent map) navigation search request, terms (e.g., merchandise) associated with an online shopping search request, terms (e.g., food) associated with an order search request, and so forth.

The query segmentation module 420 may be configured to segment the target query term based at least on a proper noun list including at least two proper nouns. As used herein, a proper noun may refer to a name of a person, a name of a particular place, a name of a particular organization, and the like. For example, Zhougelon may be a proper noun. Also for example, Beijing university may be a proper noun. As another example, the International Committee of Red Cross (ICRC) may be a proper term.

In some embodiments, the query segmentation module 420 may segment the target query term based on a proper noun list and a segmentation technique. For example, the segmentation technique may include an N-gram segmentation technique, a forward maximum matching technique, an inverse maximum matching technique, a two-way maximum matching technique, a minimum matching technique, a best matching technique, a hidden markov model, a maximum entropy model, a conditional random field model, a neural network model, a associative-backtracking technique, and the like, or any combination thereof.

In some embodiments, the query segmentation module 420 may segment the target query term into at least two tokens. In addition, the query segmentation module 420 may compare the at least two participles to a list of proper nouns. In response to determining that at least two adjacent participles may constitute a proper noun on the proper noun list, the query segmentation module 420 may combine the at least two participles. For example, if Zhouguer is on a list of proper nouns and uses a segmentation technique to segment Zhouguer into two participles, the query segmentation module 420 may combine the two participles into a single word, i.e., Zhouguer.

In some embodiments, the query segmentation module 420 may segment the target query term based on the proper noun list and the segmentation model. The segmentation model may be used to segment terms (e.g., target query terms). In some embodiments, the query segmentation module 420 may incorporate the list of proper nouns into the segmentation model by assigning weight coefficients to the list of proper nouns. The query segmentation module 420 may then segment the target query term based on the merged segmentation model. In some embodiments, when the target query term is segmented by the merged segmentation model, the query segmentation module 420 does not segment at least one proper noun included in the target query term. As used herein, the weighting factors may be default settings for the proper noun recognition system 100 or may be adjusted in different situations.

The segmentation model may include at least two word lists associated with different application scenarios, e.g., a transportation service, an online shopping service, a map navigation service, a meal ordering service. As used herein, a list of participles may refer to a list including at least two participles associated with an application scenario. For example only, the at least two word lists may include a word list associated with a transportation service or a map navigation service (e.g., words including location), a word list associated with online shopping (e.g., words including merchandise), a word list associated with an ordering service (e.g., words including food), and the like, or any combination thereof.

The list determination module 430 may be configured to determine a proper noun list including at least two proper nouns offline. As used herein, a proper noun may refer to a person's name, a name of a particular place, a name of a particular organization, and the like. For example, Zhougelon may be a proper noun. Also for example, Beijing university may be a proper noun. As another example, the International Committee for Red Cross (ICRC) may be a proper term.

In some embodiments, the list determination module 430 may determine the list of proper nouns based on the at least two first historical retrieval records and the trained recognition model. Each of the at least two first historical search logs may include a first query term from the first user and a first TOI selected by the first user as a first search term of the first historical search log. At least a portion of the first TOIs of the at least two first historical retrieval records may include at least one proper noun. Wherein, the first user and/or the first query term corresponding to different first history retrieval records can be the same or different. The trained recognition model may be configured to provide a probability that a term (e.g., a first TOI) of each of the at least two first historical search records includes at least one proper noun. In some embodiments, the list determination module 430 may obtain at least two first historical retrieval records or trained recognition models from a storage device (e.g., memory 140), a third party (e.g., an external database), etc., disclosed elsewhere in this application.

The model training module 440 may be configured to determine a trained recognition model offline based on the at least two second historical retrieval records. Similar to the first historical search record, each of the at least two second historical search records may include a second query term from the second user and a second TOI selected by the second user as a second search term of the second historical search record. Wherein the second user and/or the second query term corresponding to different second history records may be the same or different. At least a portion of the second TOI may include at least one proper noun. In some embodiments, the at least two second historical retrieval records may be partially or completely different from the at least two first historical retrieval records. A more detailed description of the trained recognition model may be found elsewhere in this application (e.g., fig. 7 and its description).

The proper noun recognition module 450 may be configured to determine whether the target query term includes at least one proper noun online based on the trained recognition model. In some embodiments, proper noun identification module 450 may compare the probability to a predetermined probability threshold. In response to determining that the probability is greater than a preset probability threshold, proper noun recognition module 450 may determine that the target query term includes at least one proper noun. As used herein, the predetermined probability threshold may be a default setting for the proper noun recognition system 100, or may be adjusted in different situations.

The TOI determination module 460 may be configured to determine one or more terms of interest (TOIs) associated with a target query term. As used herein, a TOI may refer to a term associated with a target query term that may be of interest to a user. In some embodiments, the TOI determination module 460 may first determine at least one of a prefix, keyword, or phrase in the target query term and determine one or more TOIs based on the prefix, keyword, or phrase.

In some embodiments, the TOI determination module 460 may determine correlation coefficients for one or more TOIs. As used herein, a correlation coefficient may represent a similarity between a target query term and a TOI. The greater the correlation coefficient, the greater the similarity between the TOI and the target query term, and the greater the probability that the user selects the TOI as a search term associated with the target query term.

In some embodiments, the TOI determination module 460 may order at least a portion of the one or more TOIs based on a proper noun list including at least two proper nouns. In some embodiments, the TOI determination module 460 may compare the target query term to proper nouns on the proper noun list. In response to determining that the target query term includes at least one proper noun from the list of proper nouns, the TOI determination module 460 may rank the TOIs that include the at least one proper noun ahead. In some embodiments, the TOI determination module 460 may assign at least one weighting coefficient to at least one proper noun. The TOI determination module 460 may rank at least one of the one or more TOIs based on the at least one weighting coefficient and a correlation coefficient of the TOIs. In response to determining that at least one proper noun in the list of proper nouns is not included in the target query term, the TOI determination module 460 may rank (e.g., from large to small) at least a portion of the one or more TOIs based on a correlation coefficient of the TOIs.

The modules in the processing engine 112 may be connected or in communication with each other via a wired connection or a wireless connection. The wired connection may include a metal cable, an optical cable, a hybrid cable, etc., or any combination thereof. The wireless connection may include a Local Area Network (LAN), a Wide Area Network (WAN), bluetooth, ZigBee network, Near Field Communication (NFC), etc., or any combination of the above. Two or more modules may be combined into a single module, and any one of the modules may be divided into two or more units. For example, the processing engine 112 may include a storage module (not shown) that may be used to store data generated by the above-described modules, such as a list of named words and trained recognition models. As another example, the model training module 440 may be unnecessary, and the trained recognition models may be obtained from a storage device (e.g., memory 140) disclosed elsewhere in this application or an external device in communication with the proper noun recognition system 100.

FIG. 5 is a flow diagram illustrating an exemplary process for segmenting target query terms based at least on a proper noun list including at least two proper nouns, according to some embodiments of the present application. In some embodiments, process 500 may be implemented by a set of instructions (e.g., an application program) stored in read only memory 230 or random access memory 240. The processor 220 and/or the modules in fig. 4 may execute the set of instructions, and when executing the set of instructions, the processor 220 and/or the modules may be configured to perform the process 500. The operation of the process shown below is for illustration purposes only. In some embodiments, process 500 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed herein. Additionally, the order of the operations of the process as shown in fig. 5 and described below is not intended to be limiting.

In 510, the processing engine 112 (e.g., the request obtaining module 410) (e.g., the processing circuitry of the processor 220) may obtain a retrieval request including a target query term associated with a user. In some embodiments, the search request may refer to a search request for searching for the target query term. By way of example only, the retrieval request may include a transportation retrieval request, an online shopping retrieval request, a map (e.g., google map, hundredth map, Tencent map) navigation retrieval request, an order retrieval request, and the like. In some embodiments, the target query term may refer to a term that the user wants to retrieve. By way of example only, the types of target query terms may include terms (e.g., address) associated with a transportation search request or a map (e.g., google map, Baidu map, Tencent map) navigation search request, terms (e.g., merchandise) associated with an online shopping search request, terms (e.g., food) associated with an order search request, and so forth.

In some embodiments, the target query term may be input by the user via the user terminal 130. For example, the user may input the target query term in an application installed on the user terminal 130. In some embodiments, the user may enter the target query term by typing, handwriting, voice, picture, etc. The input may be done by an application or device. The input device may be a keyboard, touch screen, microphone, tablet, scanner, camera, or any combination thereof.

At 520, the processing engine 112 (e.g., the query segmentation module 420) (e.g., processing circuitry of the processor 220) may segment the target query term based at least on a proper noun list including at least two proper nouns. As used herein, a proper noun may refer to a name of a person, a name of a particular place, a name of a particular organization, and the like. For example, Zhougelon may be a proper noun. Also for example, Beijing university may be a proper noun. As another example, the International Committee for Red Cross (ICRC) may be a proper term.

In some embodiments, the proper noun list may be provided by the proper noun recognition system 100 (e.g., the processing engine 112) or a third party (e.g., an external database). The proper noun recognition system 100 may predetermine the proper noun list offline and store the proper noun list in a storage device (e.g., memory 140) disclosed elsewhere in this application.

In some embodiments, the proper noun recognition system 100 may predetermine the proper noun list based on the at least two first historical retrieval records and the trained recognition model. Each of the at least two first historical search logs may include a first query term from the first user and a first TOI selected by the first user as a first search term of the first historical search log. At least a portion of the first TOIs of the at least two first historical retrieval records may include at least one proper noun. Wherein, the first user and/or the first query term corresponding to different first history retrieval records can be the same or different. The trained recognition model may be configured to provide a probability that a term (e.g., a first TOI) of each of the at least two first historical search records includes at least one proper noun. In some embodiments, if the probability is greater than a predetermined probability threshold, the proper noun recognition system 100 may add at least one proper noun to the proper noun list. A more detailed description of the predetermined proper noun list may be found elsewhere in this application, such as in fig. 6 and its description.

In some embodiments, the at least two first historical retrieval records and/or the trained recognition model may be obtained from, for example, a storage device (e.g., memory 140), a third party, or the like. The proper noun recognition system 100 may predetermine the trained recognition model offline and store the trained recognition model in a storage device (e.g., the memory 140) disclosed elsewhere in this application. A more detailed description of predetermining a trained recognition model may be found elsewhere in this application, such as in fig. 7 and its description.

In some embodiments, the processing engine 112 may segment the target query term based on a proper noun list and a segmentation technique. For example, the segmentation technique may include an N-gram segmentation technique, a forward maximum matching technique, an inverse maximum matching technique, a two-way maximum matching technique, a minimum matching technique, a best matching technique, a hidden markov model, a maximum entropy model, a conditional random field model, a neural network model, a associative-backtracking technique, and the like, or any combination thereof.

In some embodiments, the processing engine 112 may segment the target query term into at least two tokens. In addition, the processing engine 112 may compare the at least two participles to a list of proper nouns. In response to determining that at least two adjacent participles may constitute a proper noun on the proper noun list, processing engine 112 may combine the at least two participles. For example, if a segmentation technique is used to segment shepherd into two participles, processing engine 112 may combine the two participles into a single word, namely shepherd, on a list of proper nouns.

In some embodiments, processing engine 112 may segment the target query term based on the proper noun list and the segmentation model. The segmentation model may be used to segment terms (e.g., target query terms). In some embodiments, processing engine 112 may incorporate the list of proper nouns into the segmentation model by assigning weight coefficients to the list of proper nouns. Processing engine 112 may then segment the target query term based on the merged segmentation model. In some embodiments, when the target query term is segmented by the merged segmentation model, the processing engine 112 does not segment at least one proper noun included in the target query term. As used herein, the weighting factors may be default settings for the proper noun recognition system 100 or may be adjusted in different situations.

The segmentation model may include at least two word lists associated with different application scenarios, e.g., a transportation service, an online shopping service, a map navigation service, a meal ordering service. As used herein, a list of participles may refer to a list including at least two participles associated with an application scenario. For example only, the at least two word lists may include a word list associated with a transportation service or a map navigation service (e.g., words including location), a word list associated with online shopping (e.g., words including merchandise), a word list associated with an ordering service (e.g., words including food), and the like, or any combination thereof.

The present embodiment has one of the following technical effects: 1. a proper noun list is determined in advance offline, and query words are segmented on the basis of the proper noun list on line in real time, so that the time for responding to a client retrieval request can be saved; 2. for the query term containing the proper noun, the proper noun is not segmented, the TOI determined by the system and related to the query term is more accurate, and the user can effectively select the search term corresponding to the query term from the sequencing result, so that the search requirement of the user is more effectively met.

It should be understood that the foregoing description is for purposes of illustration only and is not intended to limit the scope of the present disclosure. Many modifications and variations will be apparent to those of ordinary skill in the art in light of the description of the present application. However, such modifications and variations do not depart from the scope of the present application.

FIG. 6 is a flow diagram illustrating an exemplary process for determining a proper noun list including at least two proper nouns according to some embodiments of the present application. In some embodiments, process 600 may be implemented by a set of instructions (e.g., an application program) stored in read only memory 230 or random access memory 240. The processor 220 and/or the modules in fig. 4 may execute the set of instructions, and when executing the set of instructions, the processor 220 and/or the modules may be configured to perform the process 600. The operation of the process shown below is for illustration purposes only. In some embodiments, process 600 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed herein. Additionally, the order of the operations of the process as shown in fig. 6 and described below is not intended to be limiting.

In 610, the processing engine 112 (e.g., the list determination module 430) (e.g., the processing circuitry of the processor 220) may obtain at least two first historical retrieval records over a preset time period (e.g., the last month, the last three months, the last year). In some embodiments, the processing engine 112 may retrieve the at least two first historical retrieval records from a storage device (e.g., memory 140), an external database, or the like disclosed elsewhere in this application.

Each of the at least two first historical search logs may include a first query term from the first user and a first TOI selected by the first user as a first search term of the first historical search log. Wherein, the first user and/or the first query term corresponding to different first history retrieval records can be the same or different. At least a portion of the first TOIs of the at least two first historical retrieval records may include at least one proper noun.

In 620, processing engine 112 (e.g., list determination module 430) (e.g., processing circuitry of processor 220) may obtain a trained recognition model (e.g., a support vector machine model). The trained recognition model may be configured to provide a probability that a word (e.g., the first TOI) includes at least one proper noun. In some embodiments, the processing engine 112 may obtain the at least two first historical retrieval records from a storage device (e.g., storage device 140), a third party, etc., disclosed elsewhere in this application.

At 630, the processing engine 112 (e.g., the list determination module 430) (e.g., the processing circuitry of the processor 220) may determine a list of proper nouns based at least on the trained recognition model and the at least two first historical retrieval records. In some embodiments, the processing engine 112 may determine the list of proper nouns based on a first probability of each of the at least two first historical search records, the first probability being determined based on the trained recognition model. In some embodiments, for each of the at least two first historical retrieval records, the processing engine 112 may determine whether the respective first probability is greater than a first preset probability threshold. In response to determining that the respective first probability is greater than the first preset probability threshold, the processing engine 112 may add at least one proper noun included in the first TOI to the proper noun list. As used herein, the first predetermined probability threshold may be a default setting for the proper noun recognition system 100, or may be adjusted under different circumstances.

It should be understood that the foregoing description is for purposes of illustration only and is not intended to limit the scope of the present disclosure. Many modifications and variations will be apparent to those of ordinary skill in the art in light of the description of the present application. However, such modifications and variations do not depart from the scope of the present application. For example, one or more other optional steps (e.g., a storing step) may be added elsewhere in the process 600. In the storing step, the processing engine 112 may store the proper noun list in a storage device (e.g., memory 140) disclosed elsewhere in this application.

FIG. 7 is a flow diagram illustrating an exemplary process for determining a trained recognition model according to some embodiments of the present application. In some embodiments, process 700 may be implemented by a set of instructions (e.g., an application program) stored in read only memory 230 or random access memory 240. The processor 220 and/or the modules in fig. 4 may execute the set of instructions, and when executing the set of instructions, the processor 220 and/or the modules may be configured to perform the process 700. The operation of the process shown below is for illustration purposes only. In some embodiments, process 700 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed herein. Additionally, the order of the operations of the process as shown in fig. 7 and described below is not intended to be limiting.

At 710, processing engine 112 (e.g., model training module 440) (e.g., processing circuitry of processor 220) may obtain at least two second historical retrieval records. In some embodiments, the processing engine 112 may retrieve at least two second historical retrieval records from a storage device (e.g., memory 140), an external database, or the like, as disclosed elsewhere in this application.

As depicted at 610, each of the at least two second historical search records may include a second query term from the second user and a second TOI selected by the second user as a second search term of the second historical search record, similar to the first historical search record. Wherein, the second user and/or the second query term corresponding to different second history records may be the same or different, at least a part of the second TOI may include at least one proper noun. In some embodiments, the at least two second historical retrieval records may be partially or completely different from the at least two first historical retrieval records.

At 720, processing engine 112 (e.g., model training module 440) (e.g., processing circuitry of processor 220) may obtain at least two training samples based on the at least two second historical retrieval records. In some embodiments, the processing engine 112 may extract feature information associated with each of the at least two second historical retrieval records. The processing engine 112 may determine at least two training samples based on the feature information and the at least two second histories.

In some embodiments, for each of the at least two second historical search records, the processing engine 112 may segment the second TOI in the second historical search record into at least two second participles. As illustrated in operation 520, the processing engine 112 may slice the second TOI based on a slicing technique or a slicing model. A more detailed description may be found at 520 and will not be repeated here.

The processing engine 112 may then determine feature information associated with each of the at least two second historical search records based on the at least two second participles and the second historical query. For example only, for one second participle of the at least two second participles, the characteristic information may include a first consistency parameter associated with the second query word and the second participle, a second consistency parameter associated with the second TOI and the second participle, a degree of cohesion parameter associated with the second participle, a left entropy associated with the second participle, a right entropy associated with the second participle, a degree of freedom associated with the second participle, a probability that the second TOI is located at a beginning of the query or at an end of the second TOI, a frequency associated with the second participle, and so on.

As used herein, a first consistency parameter may refer to a probability that at least two subsequent participles are a single term in a second historical query. Taking two participles as an example, the probability may be a ratio between a first number of second historical queries comprising participle 1 and participle 2, participle 1 and participle 2 being consecutive and being a single term, and a second number of second historical queries comprising participle 1 and participle 2. If the first count is n and the second count is m, the consistency parameter of the at least two tokens may be n/m x 100%. Similar to the first consistency parameter, the second consistency parameter may refer to a probability that at least two consecutive participles are a single word in the second historical TOI.

The degree of aggregation parameter may refer to a correlation or cohesion between at least two participles, i.e. the probability that at least two participles may constitute a single word. For example, the cohesion parameter of a building owner is greater than the cohesion parameter of a building door. In some embodiments, the processing engine 112 may determine the cohesion parameter according to equation (1) below:

Figure BDA0002017925200000281

wherein C refers to a cohesion parameter of a word 1 comprising two participles, wherein the word 1 can be split into the participle 1 and the participle 2 or the participle 3 and the participle 4. P1Refers to the probability, P, of occurrence of the word 1 in the corpus databaseleft1Means the probability, P, of occurrence of a word 1 in the corpus databaseright2Means the probability, P, of occurrence of a word 2 in the corpus databaseleft3Means the probability of occurrence of word 3 in the corpus database, and Pright4Refers to the probability of the occurrence of the participle 4 in the corpus database. Wherein the corpus database includes a set of written or spoken text for language studies. The corpus database may be obtained by the processing engine 112 from a storage device (e.g., memory 140), a third party (e.g., an external database), etc., disclosed elsewhere in this application.

The left entropy may refer to the diversity of combinations between the participles before the reference participle and the reference participle. The right entropy may refer to the diversity of combinations between the participles following the reference segmentation and the reference participle. In some embodiments, processing engine 112 may determine the left entropy and the right entropy according to equation (2) and equation (3), respectively:

L=∑p(mi)×log(p(mi)) (2)

R=∑p(nj)×log(p(nj)) (3)

wherein L denotes left entropy, miRefers to the ith second participle, P (m), in the corpus database, of the set of second participles that appears to the left of the first participlei) The probability that the ith second word segmentation appears on the left side of the first word segmentation in the corpus database is referred to, R refers to right entropy, n refers toiRefers to the jth third participle in the corpus of third participles appearing to the right of the first participle, P (n)i) Refers to the summary that the jth third word segmentation appears right to the first word segmentation in the corpus databaseAnd (4) rate.

The degree of freedom may refer to the diversity of combinations between the reference participle and other participles. The degrees of freedom may be associated with left entropy or right entropy. In some embodiments, processing engine 112 may determine the degrees of freedom according to equation (4):

F=min(L,R) (4)

wherein, the degree of freedom is F, L is left entropy, and R is right entropy.

In some embodiments, the processing engine 112 may determine at least two training samples based on the candidate second historical search record and the feature information corresponding to the candidate second historical search record. The processing engine 112 may select candidate second historical search records based on feature information associated with at least two second historical search records. For the candidate second history retrieval record, the feature information associated with the candidate second history retrieval record satisfies a first preset condition. In some embodiments, the first preset condition may include that at least one of the characteristic information is respectively greater than a preset threshold. For example, the first preset condition may include the first consistency parameter being greater than a preset threshold. For another example, the first preset condition may include that the degree of freedom is greater than a preset threshold. For another example, the first preset condition may include the first consistency parameter being greater than a first preset threshold and the degree of freedom being greater than a second preset threshold.

In some embodiments, the processing engine 112 may determine at least two training samples based on the filtered (candidate) second historical search records. The (candidate) second history search records may be screened out manually. The provider of the proper noun recognition system 100 may retrieve the data, such as via the internet, and determine whether the corresponding (candidate) second TOI actually includes the corresponding at least one proper noun.

In some embodiments, the processing engine 112 may determine at least two positive training samples and at least two negative samples based on the second historical search record, the candidate second historical search record, or the filtered (candidate) second historical search record. As used herein, each of the at least two training samples may include a second historical query and a second TOI including at least one proper noun. Each of the at least two negative training samples may include a second historical query and a second TOI that does not include any proper nouns.

In 730, the processing engine 112 (e.g., the model training module 440) (e.g., the processing circuitry of the processor 220) may determine an initial recognition model. The initial recognition model may include an initial decision tree model (e.g., an initial two-classification tree model), an initial naive bayes model, an initial enhanced tree model, an initial nearest neighbor model, an initial support vector machine model, or the like.

In 740, the processing engine 112 (e.g., the model training module 440) (e.g., the processing circuitry of the processor 220) may determine at least two sample probabilities for the at least two training samples based on the initial recognition model and the feature information for each of the at least two training samples. As described elsewhere in this application, similar to the first probability, the sample probability may refer to a probability that a word (e.g., the second TOI) includes at least one proper noun.

In some embodiments, the processing engine 112 may input feature information of at least two training samples into the trained recognition model. The trained recognition model may output a sample probability. In some embodiments, the processing engine 112 may input at least two training samples into the trained recognition model. The trained recognition model may output a sample probability.

At 750, the processing engine 112 (e.g., the model training module 440) (e.g., the processing circuitry of the processor 220) may determine whether at least two sample probabilities satisfy a preset condition. For example, the processing engine 112 may determine a loss function for the initial recognition model and determine a value for the loss function based on at least two sample probabilities. Further, the processing engine 112 may determine whether the value of the loss function is less than a threshold. The threshold may be a default setting for the proper noun recognition system 100 or may be adjusted in different circumstances.

In 760, in response to determining that the at least two sample probabilities satisfy the preset condition, the processing engine 112 (e.g., the model training module 440) (e.g., the processing circuitry of the processor 220) may designate the initial recognition model as a trained recognition model. On the other hand, in response to determining that the at least two sample probabilities do not satisfy the preset condition, the processing engine 112 may perform the process 700 and return to operation 730 to update the initial recognition model. For example, the processing engine 112 may update one or more preliminary parameters (e.g., weight matrices, deviation vectors) of the initial recognition model to produce an updated recognition model.

Further, the processing engine 112 may determine whether at least two updated sample probabilities under the updated recognition model satisfy a preset condition. In response to determining that the at least two updated sample probabilities satisfy the preset condition, the processing engine 112 may designate the updated recognition model as a trained recognition model in 760. On the other hand, in response to determining that the at least two updated sample probabilities still do not satisfy the preset condition, the processing engine 112 may still perform the process 700 and return to 730 to update the updated recognition model until the at least two updated sample probabilities satisfy the preset condition.

It should be understood that the foregoing description is for purposes of illustration only and is not intended to limit the scope of the present disclosure. Many modifications and variations will be apparent to those of ordinary skill in the art in light of the description of the present application. However, such modifications and variations do not depart from the scope of the present application.

FIG. 8 is a flow diagram illustrating an exemplary process for identifying at least one proper noun included in a target query term in accordance with some embodiments of the present application. In some embodiments, process 800 may be implemented by a set of instructions (e.g., an application program) stored in read only memory 230 or random access memory 240. The processor 220 and/or the modules in fig. 4 may execute the set of instructions, and when executing the set of instructions, the processor 220 and/or the modules may be configured to perform the process 800. The operation of the process shown below is for illustration purposes only. In some embodiments, process 800 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed herein. Additionally, the order of the operations of the process as shown in fig. 8 and described below is not intended to be limiting.

At 810, processing engine 112 (e.g., proper noun recognition module 450) (e.g., processing circuitry of processor 220) may obtain a search request that includes the target query term. As depicted at 510, a search request may refer to a search request for searching for a target query term. The target query term may refer to a term that the user wants to retrieve. By way of example only, the types of target query terms may include terms (e.g., address) associated with a transportation search request or a map (e.g., google map, Baidu map, Tencent map) navigation search request, terms (e.g., merchandise) associated with an online shopping search request, terms (e.g., food) associated with an order search request, and so forth.

At 820, processing engine 112 (e.g., proper noun recognition module 450) (e.g., processing circuitry of processor 220) may determine whether the target query term includes at least one proper noun based on the trained recognition model. As illustrated in fig. 7, the trained recognition model may be configured to provide a probability that a term (e.g., a target query term) includes at least one proper noun. The trained recognition model may be trained based on the at least two second historical retrieval records and the initial recognition model. A more detailed description can be found in fig. 7 and will not be repeated here.

At 830, in response to determining that the probability is greater than the preset probability threshold, the processing engine 112 (e.g., proper noun recognition module 450) (e.g., processing circuitry of the processor 220) may determine that the target query term includes at least one proper noun. In some embodiments, the processing engine 112 may compare the probability to a preset probability threshold. In response to determining that the probability is greater than a preset probability threshold, the processing engine 112 may determine that the target query term includes at least one proper noun. As used herein, the predetermined probability threshold may be a default setting for the proper noun recognition system 100, or may be adjusted in different situations.

In some embodiments, the segmentation of the target query may be intervened based on the identified proper nouns. For example, the proper noun system 100 does not segment proper nouns into two or more segments, so that the TOI determined based on the segmentation result is more accurate with respect to the target query. In some embodiments, if the recognized proper nouns are new words, the new words can be added into the written dictionary, and the effectiveness of the dictionary can be enhanced. The present embodiment has one of the following technical effects: 1. based on the trained recognition model, whether the query word in the user search request contains proper nouns or not can be judged in real time; 2. based on the judgment that the query word in the user search request contains the proper noun and the proper noun is not segmented, the TOI determined by the system and related to the query word is more accurate, and the user can effectively select the search word corresponding to the query word, so that the search requirement of the user is more effectively met.

It should be understood that the foregoing description is for purposes of illustration only and is not intended to limit the scope of the present disclosure. Many modifications and variations will be apparent to those of ordinary skill in the art in light of the description of the present application. However, such modifications and variations do not depart from the scope of the present application.

FIG. 9 is a flow diagram illustrating an exemplary process for ordering TOIs based on a proper noun list including at least two proper nouns, according to some embodiments of the present application. In some embodiments, process 900 may be implemented by a set of instructions (e.g., an application program) stored in read only memory 230 or random access memory 240. The processor 220 and/or the modules in fig. 4 may execute the set of instructions, and when executing the set of instructions, the processor 220 and/or the modules may be configured to perform the process 900. The operation of the process shown below is for illustration purposes only. In some embodiments, process 900 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed herein. Additionally, the order of the operations of the process as shown in fig. 9 and described below is not intended to be limiting.

At 910, the processing engine 112 (e.g., the TOI determination module 460) (e.g., the processing circuitry of the processor 220) may obtain a retrieval request that includes the target query term. As depicted at 510 or 810, a search request may refer to a search request for searching for a target query term. The target query term may refer to a term that the user wants to retrieve. By way of example only, the types of target query terms may include terms (e.g., address) associated with a transportation search request or a map (e.g., google map, Baidu map, Tencent map) navigation search request, terms (e.g., merchandise) associated with an online shopping search request, terms (e.g., food) associated with an order search request, and so forth.

In 920, the processing engine 112 (e.g., the TOI determination module 460) (e.g., the processing circuitry of the processor 220) may determine one or more terms of interest (TOIs) associated with the target query term. As used herein, a TOI may refer to a term associated with a target query term that may be of interest to a user.

In some embodiments, the processing engine 112 may first determine at least one of a prefix, keyword, or phrase in the target query term and determine one or more TOIs based on the prefix, keyword, or phrase.

In some embodiments, the processing engine 112 may determine correlation coefficients for one or more TOIs. As used herein, a correlation coefficient may represent a similarity between a target query term and a TOI. The greater the correlation coefficient, the greater the similarity between the TOI and the target query term, and the greater the probability that the user selects the TOI as a search term associated with the target query term.

In 930, the processing engine 112 (e.g., the TOI determination module 460) (e.g., processing circuitry of the processor 220) may order at least a portion of the one or more TOIs based on a proper noun list including at least two proper nouns. In some embodiments, processing engine 112 may compare the target query term to proper nouns on the proper noun list. In response to determining that the target query term includes at least one proper noun from the list of proper nouns, processing engine 112 may rank the TOIs that include the at least one proper noun ahead. In some embodiments, processing engine 112 may assign at least one weighting coefficient to at least one proper noun. The processing engine 112 may rank at least one of the one or more TOIs based on the at least one weighting coefficient and the correlation coefficient of the TOIs. In response to determining that at least one proper noun in the list of proper nouns is not included in the target query term, processing engine 112 may rank (e.g., from large to small) at least a portion of the one or more TOIs based on a correlation coefficient of the TOIs.

The present embodiment has one of the following technical effects: 1. for the user query words containing proper nouns, sequencing TOIs related to the user query words at least based on the proper noun list, and arranging the TOIs more related to the user query words at the front, so that the sequencing is more accurate; 2. the sequencing result is displayed to the user, and the user can effectively select the retrieval terms corresponding to the query terms from the sequencing result, so that the retrieval requirements of the user are more effectively met.

It should be understood that the foregoing description is for purposes of illustration only and is not intended to limit the scope of the present disclosure. Many modifications and variations will be apparent to those of ordinary skill in the art in light of the description of the present application. However, such modifications and variations do not depart from the scope of the present application. For example, the processing engine 112 may update the trained recognition model based on at least two newly acquired second historical search records at certain time intervals (e.g., every month, every two months).

While the basic concepts have been described above, it will be apparent to those of ordinary skill in the art in view of this disclosure that this disclosure is intended to be exemplary only, and is not intended to limit the present application. Various modifications, improvements and adaptations to the present application may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present application and thus fall within the spirit and scope of the exemplary embodiments of the present application.

Also, this application uses specific language to describe embodiments of the application. For example, "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the present application may be combined as appropriate.

Moreover, those of ordinary skill in the art will understand that aspects of the present application may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, articles, or materials, or any new and useful modification thereof. Accordingly, various aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as a "unit", "module", or "system". Furthermore, aspects disclosed herein may take the form of a computer program product embodied in one or more computer-readable media, with computer-readable program code embodied therein.

A computer readable signal medium may comprise a propagated data signal with computer program code embodied therein, for example, on a baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including electro-magnetic, optical, and the like, or any suitable combination. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code on a computer readable signal medium may be propagated over any suitable medium, including radio, electrical cable, fiber optic cable, RF, or the like, or any combination thereof.

Computer program code required for operation of various portions of the present application may be written in any one or more programming languages, including a subject oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which elements and sequences of the processes described herein are processed, the use of alphanumeric characters, or the use of other designations, is not intended to limit the order of the processes and methods described herein, unless explicitly claimed. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the present application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the embodiments. Similarly, it should be noted that in the preceding description of embodiments of the present application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the embodiments.

35页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种术语替换方法及系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!