Retrieval method and device of traditional Chinese medicine medicinal materials, electronic equipment and storage medium

文档序号:1889408 发布日期:2021-11-26 浏览:4次 中文

阅读说明:本技术 一种中药药材的检索方法、装置、电子设备及存储介质 (Retrieval method and device of traditional Chinese medicine medicinal materials, electronic equipment and storage medium ) 是由 李正 于 2021-08-31 设计创作,主要内容包括:本发明公开了一种中药药材的检索方法、装置、电子设备及存储介质,本发明对于同一个中药有着不同的称呼的情况下,在检索时只需用户输入自己知道或熟悉的名称进行检索即可得到期望的检索结果;同时,本发明将搜索字段转换为汉语拼音后进行检索,能有效涵盖“音同字不同”的场景,最大可能排除用户因输入别字而导致无法检索出期望商品的问题。(The invention discloses a method and a device for searching traditional Chinese medicine materials, electronic equipment and a storage medium, wherein under the condition that the same traditional Chinese medicine is called differently, a user only needs to input a name known or familiar to the user for searching to obtain an expected searching result; meanwhile, the search field is converted into the Chinese pinyin and then searched, so that the scene of 'different sounds and characters' can be effectively covered, and the problem that the expected goods cannot be searched out due to the fact that the user inputs different characters can be eliminated to the greatest extent.)

1. A retrieval method of traditional Chinese medicinal materials is characterized by comprising the following steps:

establishing a medicinal material name retrieval library, wherein any medicinal material name in the medicinal material name retrieval library comprises a Chinese character name field and a pinyin field corresponding to the Chinese character name field, and the Chinese character name field comprises all medicinal names for representing the same medicinal material;

acquiring a search field input by a user;

converting the search field into pinyin to obtain a search pinyin field;

according to the search field and the search pinyin field, searching a medicine name matched with the search pinyin field and a medicine name matched with the search field in the medicine name retrieval library to serve as retrieval results;

and returning the retrieval result to the front end for displaying.

2. The method of claim 1, wherein building a drug name search library comprises:

for any one traditional Chinese medicinal material in all traditional Chinese medicinal materials, acquiring all medicine names for representing the any one traditional Chinese medicinal material as a Chinese character name field of the any one traditional Chinese medicinal material;

converting the Chinese character name field of any Chinese medicinal material into pinyin to obtain a pinyin field corresponding to the Chinese character name field of any Chinese medicinal material;

establishing a retrieval index for the Chinese character name field and the pinyin field of any Chinese medicinal material;

and establishing the medicinal material name retrieval library according to the Chinese character name field, the pinyin field and the retrieval index of any Chinese medicinal material.

3. The method of claim 2, wherein the index comprises an index hash value and a storage address corresponding to the index hash value, wherein the index hash value comprises a hash value corresponding to the hanzi name field and a hash value corresponding to the pinyin field;

according to the search field and the search pinyin field, searching a medicine name matched with the search pinyin field and a medicine name matched with the search field in the medicine name search library, and taking the medicine name matched with the search field as a search result, wherein the search result comprises the following steps of;

calculating hash values of the search field and the search pinyin field to serve as search hash values;

searching an index hash value matched with the search hash value in the medicinal material name retrieval library to serve as a retrieval hash value;

and taking the medicine material name in the storage address corresponding to the retrieval hash value as the retrieval result.

4. The method of claim 1, wherein before returning the search results to a front end for presentation, the method further comprises:

calculating the similarity between the medicinal material name corresponding to the retrieval result and the search field and the similarity between the medicinal material name corresponding to the retrieval result and the search pinyin field;

and sorting the retrieval results according to the sequence of the similarity from high to low.

5. The method of claim 4, wherein calculating the similarity between the medicine material name corresponding to the search result and the search field comprises:

carrying out concept division on Chinese character name fields in the medicinal material names corresponding to the retrieval results to obtain a first concept set;

performing concept division on the search field to obtain a second concept set, wherein the first concept set and the second concept set respectively comprise a first basic semantic description, other basic semantic descriptions, a relation semantic description and a relation symbol description;

calculating the similarity between the first basic semantic description in the first concept set and the first basic semantic description in the second concept set to obtain the similarity of the first basic semantic descriptions;

calculating the similarity between the descriptions of other basic primitives in the first concept set and the descriptions of other basic primitives in the second concept set to obtain the similarity of the descriptions of other basic primitives;

calculating the similarity between the relation semantic element description in the first concept set and the relation semantic element description in the second concept set to obtain the similarity of the relation semantic element description;

calculating the similarity of the relation symbol description in the first concept set and the relation symbol description in the second concept set to obtain the similarity of the relation symbol descriptions;

and obtaining the similarity between the medicine name corresponding to the retrieval result and the search field according to the description similarity of the first basic semantic, the description similarities of the other basic semantic, the description similarity of the relationship semantic and the description similarity of the relationship symbol.

6. The method of claim 4, wherein calculating the similarity between the name of the drug corresponding to the search result and the pinyin field comprises:

acquiring the total number of first letters, wherein the total number of the first letters is the number of letters existing in a pinyin field in the medicine name corresponding to the retrieval result and a pinyin search field;

acquiring the total number of second letters, wherein the total number of the second letters is the number of letters which exist in the pinyin field and do not exist in the pinyin search field;

acquiring the total number of third letters, wherein the total number of the third letters is the number of letters which exist in the pinyin searching field and do not exist in the pinyin searching field;

and obtaining the similarity between the medicine material name corresponding to the retrieval result and the searched pinyin field according to the total number of the first letters, the total number of the second letters and the total number of the third letters.

7. The method of claim 1, wherein prior to converting the search field to pinyin to obtain a search pinyin field, the method further comprises:

judging whether each Chinese character in the search field is a traditional Chinese character or not;

if yes, converting each Chinese character in the search field into a simplified form.

8. A retrieval device of traditional chinese medicine medicinal material, characterized by includes: the system comprises a database establishing unit, an acquiring unit, a converting unit, a searching unit and a transmitting unit;

the database establishing unit is used for establishing a medicinal material name retrieval library, wherein any medicinal material name in the medicinal material name retrieval library comprises a Chinese character name field and a pinyin field corresponding to the Chinese character name field, and the Chinese character name field comprises all medicinal material names for representing the same medicinal material;

the acquisition unit is used for acquiring a search field input by a user;

the conversion unit is used for converting the search field into pinyin to obtain a search pinyin field;

the searching unit is used for searching the medicine name matched with the searching pinyin field and the medicine name matched with the searching pinyin field in the medicine name searching library according to the searching field and the searching pinyin field to obtain a searching result;

and the transmission unit is used for returning the retrieval result to the front end for displaying.

9. An electronic device, comprising: the Chinese medicinal material retrieval method comprises a memory, a processor and a transceiver which are connected in sequence, wherein the memory is used for storing a computer program, the transceiver is used for receiving and transmitting messages, and the processor is used for reading the computer program and executing the Chinese medicinal material retrieval method as claimed in any one of claims 1 to 7.

10. A storage medium, wherein the storage medium stores instructions, and when the instructions are run on a computer, the method for retrieving traditional Chinese medicine materials according to any one of claims 1 to 7 is performed.

Technical Field

The invention belongs to the technical field of medicinal material retrieval, and particularly relates to a retrieval method and device of traditional Chinese medicinal materials, electronic equipment and a storage medium.

Background

With the rapid development of the internet, the connection between the traditional Chinese medicine industry and the internet is more and more compact, and the mall is taken as an interactive platform between a client and an operator, so that the traditional Chinese medicine industry becomes one of important operation modes; the operation mode is as follows: the user inputs keywords in the search bar, the mall searches according to the keywords of the user to obtain a search result (namely a commodity name) through matching, and the user browses and/or places an order for the search result.

At present, the search of keywords is mostly carried out by taking the fuzzy matching function of commodity names provided by a backend database as a search condition; if "folium artemisiae argyi" is searched, a commodity containing the keyword "folium artemisiae argyi" in the commodity name is obtained: three-level folium artemisiae argyi, two-level folium artemisiae argyi (wholesale products), and the like, but due to the specialty and the particularity of the traditional Chinese medicine industry for the names of medicinal materials, the keyword retrieval method has the following defects:

(1) the medicine names have more rarely-used words, so that the user can not spell the medicine frequently and input different words for replacement, and the expected result can not be retrieved; (2) due to historical and regional factors, the same medicinal material commodity has a plurality of names; for example, paris polyphylla, rhizoma atractylodis and the like are called as paris polyphylla, so that the user is very easy to lack data due to multiple aliases of commodities, and the user cannot search out expected commodities; therefore, it is urgent to provide an accurate and comprehensive retrieval method.

Disclosure of Invention

The invention aims to provide a method and a device for searching traditional Chinese medicine materials, electronic equipment and a storage medium, and aims to solve the problem that expected commodities cannot be searched due to the fact that users input different characters for substitution and a plurality of names exist in the traditional Chinese medicine material searching.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a retrieval method of traditional Chinese medicine materials, which comprises the following steps:

establishing a medicinal material name retrieval library, wherein any medicinal material name in the medicinal material name retrieval library comprises a Chinese character name field and a pinyin field corresponding to the Chinese character name field, and the Chinese character name field comprises all medicinal names for representing the same medicinal material;

acquiring a search field input by a user;

converting the search field into pinyin to obtain a search pinyin field;

according to the search field and the search pinyin field, searching a medicine name matched with the search pinyin field and a medicine name matched with the search field in the medicine name retrieval library to serve as retrieval results;

and returning the retrieval result to the front end for displaying.

Based on the above disclosure, the invention firstly establishes a comprehensive retrieval database, namely a medicine name retrieval database, wherein any medicine name in the retrieval database comprises a medicine name (namely comprises a plurality of aliases different due to regions) representing the medicine and a pinyin field corresponding to each alias; therefore, a comprehensive database can be provided for searching medicinal materials, different aliases of the input medicinal materials are ensured, and accurate retrieval can be realized; secondly, the search field input by the user is converted into a search pinyin field, and the search pinyin field and the search field are used as the basis to search in the database; therefore, the scene of 'the same voice and the same character' can be effectively covered, and the problem that the user cannot retrieve the expected goods due to inputting different characters is eliminated to the greatest extent.

Through the design, under the condition that the same traditional Chinese medicine is called differently, the user can obtain the expected retrieval result only by inputting the name known or familiar to the user for retrieval during retrieval; meanwhile, the search field is converted into pinyin and then searched, so that the scene of 'same pronunciation and different characters' can be effectively covered, and the problem that the expected goods cannot be searched out due to the fact that the user inputs different characters can be eliminated to the greatest extent.

In one possible design, a drug name search library is established, including:

for any one traditional Chinese medicinal material in all traditional Chinese medicinal materials, acquiring all medicine names for representing the any one traditional Chinese medicinal material as a Chinese character name field of the any one traditional Chinese medicinal material;

converting the Chinese character name field of any Chinese medicinal material into pinyin to obtain a pinyin field corresponding to the Chinese character name field of any Chinese medicinal material;

establishing a retrieval index for the Chinese character name field and the pinyin field of any Chinese medicinal material;

and establishing the medicinal material name retrieval library according to the Chinese character name field, the pinyin field and the retrieval index of any Chinese medicinal material.

Based on the above disclosure, the invention discloses the establishment steps of the drug name search library, namely, all the alias names of the same drug are introduced to obtain the name field of the Chinese character; then all the alias names of the same medicinal material are converted into Chinese character pinyin, so that pinyin fields are obtained; and finally, establishing a retrieval index (which is used for facilitating quick search during retrieval) for the Chinese character name fields and the pinyin fields of all the medicinal materials, so that the establishment of a database of the medicinal material names of all the medicinal materials can be completed.

In one possible design, the retrieval index includes an index hash value and a storage address corresponding to the index hash value, where the index hash value includes a hash value corresponding to the kanji name field and a hash value corresponding to the pinyin field;

according to the search field and the search pinyin field, searching a medicine name matched with the search pinyin field and a medicine name matched with the search field in the medicine name search library, and taking the medicine name matched with the search field as a search result, wherein the search result comprises the following steps of;

calculating hash values of the search field and the search pinyin field to serve as search hash values;

searching an index hash value matched with the search hash value in the medicinal material name retrieval library to serve as a retrieval hash value;

and taking the medicine material name in the storage address corresponding to the retrieval hash value as the retrieval result.

Based on the content disclosed above, the hash value of the Chinese character name field and the hash value of the pinyin field are used as index hash values, so that the index hash values are used as index indexes to establish address links with the corresponding medicinal material names; during retrieval, the hash values of the search field and the search pinyin field can be calculated firstly, then the search hash values are compared with the index hash values to judge whether the same hash values exist or not, if yes, the search is carried out according to the storage address, and therefore the name of the medicinal material corresponding to the storage address is used as a retrieval result.

In one possible design, before returning the search result to the front end for presentation, the method further includes:

calculating the similarity between the medicinal material name corresponding to the retrieval result and the search field and the similarity between the medicinal material name corresponding to the retrieval result and the search pinyin field;

and sorting the retrieval results according to the sequence of the similarity from high to low.

Based on the above disclosure, the retrieval results are sorted according to the high sequence of the similarity, the result with high matching degree can be displayed in front, and the user can view the name of the medicinal material with high correlation degree preferentially.

In one possible design, calculating the similarity between the drug name corresponding to the search result and the search field includes:

carrying out concept division on Chinese character name fields in the medicinal material names corresponding to the retrieval results to obtain a first concept set;

performing concept division on the search field to obtain a second concept set, wherein the first concept set and the second concept set respectively comprise a first basic semantic description, other basic semantic descriptions, a relation semantic description and a relation symbol description;

calculating the similarity between the first basic semantic description in the first concept set and the first basic semantic description in the second concept set to obtain the similarity of the first basic semantic descriptions;

calculating the similarity between the descriptions of other basic primitives in the first concept set and the descriptions of other basic primitives in the second concept set to obtain the similarity of the descriptions of other basic primitives;

calculating the similarity between the relation semantic element description in the first concept set and the relation semantic element description in the second concept set to obtain the similarity of the relation semantic element description;

calculating the similarity of the relation symbol description in the first concept set and the relation symbol description in the second concept set to obtain the similarity of the relation symbol descriptions;

and obtaining the similarity between the medicine name corresponding to the retrieval result and the search field according to the description similarity of the first basic semantic, the description similarities of the other basic semantic, the description similarity of the relationship semantic and the description similarity of the relationship symbol.

In one possible design, calculating the similarity between the name of the medicinal material corresponding to the search result and the pinyin search field includes:

acquiring the total number of first letters, wherein the total number of the first letters is the number of letters existing in a pinyin field in the medicine name corresponding to the retrieval result and a pinyin search field;

acquiring the total number of second letters, wherein the total number of the second letters is the number of letters which exist in the pinyin field and do not exist in the pinyin search field;

acquiring the total number of third letters, wherein the total number of the third letters is the number of letters which exist in the pinyin searching field and do not exist in the pinyin searching field;

and obtaining the similarity between the medicine material name corresponding to the retrieval result and the searched pinyin field according to the total number of the first letters, the total number of the second letters and the total number of the third letters.

In one possible design, before converting the search field into pinyin to obtain a search pinyin field, the method further includes:

judging whether each Chinese character in the search field is a traditional Chinese character or not;

if yes, converting each Chinese character in the search field into a simplified form.

In a second aspect, the present invention provides a device for searching traditional Chinese medicine materials, comprising: the system comprises a database establishing unit, an acquiring unit, a converting unit, a searching unit and a transmitting unit;

the database establishing unit is used for establishing a medicinal material name retrieval library, wherein any medicinal material name in the medicinal material name retrieval library comprises a Chinese character name field and a pinyin field corresponding to the Chinese character name field, and the Chinese character name field comprises all medicinal material names for representing the same medicinal material;

the acquisition unit is used for acquiring a search field input by a user;

the conversion unit is used for converting the search field into pinyin to obtain a search pinyin field;

the searching unit is used for searching the medicine name matched with the searching pinyin field and the medicine name matched with the searching pinyin field in the medicine name searching library according to the searching field and the searching pinyin field to obtain a searching result;

and the transmission unit is used for returning the retrieval result to the front end for displaying.

In a third aspect, the present invention provides an electronic device, including a memory, a processor and a transceiver, which are connected in communication in sequence, wherein the memory is used for storing a computer program, the transceiver is used for sending and receiving messages, and the processor is used for reading the computer program and executing the method for retrieving the traditional Chinese medicine materials as may be designed in any one of the first aspect or the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium, having stored thereon instructions, which, when run on a computer, perform the method for retrieving the traditional Chinese medicine materials as in the first aspect or any one of the possible designs in the first aspect.

In a fifth aspect, the present invention provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method for retrieving the traditional Chinese medicine materials as described in the first aspect or any one of the possible designs of the first aspect.

Drawings

Fig. 1 is a schematic structural diagram of a retrieval system for traditional Chinese medicine materials provided by the present invention;

FIG. 2 is a schematic flow chart illustrating steps of a method for searching Chinese medicinal materials according to the present invention;

FIG. 3 is a schematic structural diagram of a retrieval apparatus for Chinese herbal medicines according to the present invention;

fig. 4 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

The invention is further described with reference to the following figures and specific embodiments. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. Specific structural and functional details disclosed herein are merely illustrative of example embodiments of the invention. This invention may, however, be embodied in many alternate forms and should not be construed as limited to the embodiments set forth herein.

It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments of the present invention.

It should be understood that, for the term "and/or" as may appear herein, it is merely an associative relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, B exists alone, and A and B exist at the same time; for the term "/and" as may appear herein, which describes another associative object relationship, it means that two relationships may exist, e.g., a/and B, may mean: a exists independently, and A and B exist independently; in addition, for the character "/" that may appear herein, it generally means that the former and latter associated objects are in an "or" relationship.

Examples

As shown in fig. 1, a system architecture is provided for the present application, which includes a user terminal and a search engine, wherein the user terminal is configured to obtain a search field input by a user, and a search engine stores a drug name search library.

In this embodiment, the example search engine may be, but is not limited to: elastic search, which is a distributed, highly extended, highly real-time search and data analysis engine; the method and the device allow various types of search (such as structured, unstructured, geographical positions and/or measurement indexes) to be executed and combined, have various search modes, can achieve real-time search, and have the advantages of stability, reliability, quickness, convenience in installation and use and the like.

In the method for searching traditional Chinese medicine materials provided by the first aspect of the embodiment, under the condition that the same traditional Chinese medicine is called differently, the user only needs to input a name known or familiar to the user for searching to obtain an expected searching result; meanwhile, the search field is converted into the Chinese pinyin and then searched, so that the scene of 'different sounds and characters' can be effectively covered, and the problem that the expected goods cannot be searched out due to the fact that the user inputs different characters can be eliminated to the greatest extent.

The method for retrieving traditional Chinese medicine materials provided in the first aspect of this embodiment may include, but is not limited to, the following steps S1 to S5; of course, the foregoing steps S1-S5 may be, but are not limited to being, performed on the search engine side.

S1, establishing a medicinal material name retrieval library, wherein any medicinal material name in the medicinal material name retrieval library comprises a Chinese character name field and a pinyin field corresponding to the Chinese character name field, and the Chinese character name field comprises all medicinal names for representing the same medicinal material.

Step S1 is a process of establishing a search database, and each name of the drug in the database established in this embodiment includes all the aliases of the drug and the pinyin field corresponding to each alias; therefore, when the medicinal materials are searched, the medicinal materials can be accurately searched under the condition that the user inputs different aliases of the medicinal materials.

One of the following methods for establishing the drug name search library may be, but is not limited to, the following steps S101 to S104.

S101, for any one of all traditional Chinese medicine materials, acquiring all medicine names for representing the any one traditional Chinese medicine material as a Chinese character name field of the any one traditional Chinese medicine material.

S102, converting the Chinese character name field of any Chinese medicinal material into pinyin to obtain a pinyin field corresponding to the Chinese character name field of any Chinese medicinal material.

Step S101 and step S102 are processes of importing all the aliases of the medicinal materials and pinyin of the chinese characters corresponding to the aliases, that is, importing all the aliases of any medicinal material, and then converting each alias into pinyin to obtain a pinyin field of each alias, so that any medicinal material has a chinese character name field and a pinyin field.

For example, the herb paris polyphylla is also named as: rhizoma paridis, Chi's rest, rhizoma paridis, radix Euphorbiae Humifusae, radix kansui, rhizoma paridis, fructus Trichosanthis, rhizoma paridis, Ardisia, Mandarin duck, calyx seu fructus Physalis, rhizoma Schizocapsae Plantagineae, radix Callicarpae Formosanae, rhizoma Gynurae Divaricatae, radix Notoginseng, and radix Schefflerae Arboricolae; then, the names are all Chinese character name fields of the medicinal material paris polyphylla; the pinyin fields of the Chinese character name fields of the medicinal materials are as follows: zaoxiu, chixiu, chonggtaigen, zhengxiu, caoheche, baigansui, jinxianchogou, chongglou, jiudaogu, yuanyangchong, zhihuatuouu, luosiqi, hailuuoqi, dengtaiqi, baiheche, luotuosanqi, tusanqi and qiyelian.

Through the steps S101 and S102, all names and pinyins corresponding to the names of the Chinese medicinal materials can be counted, so as to establish a relationship between the names and the pinyins for subsequent retrieval of the Chinese medicinal materials.

In this embodiment, for example, but not limited to, the introduction of the names of the medicinal materials can be realized by using traditional Chinese medicines; the traditional Chinese medicine is mainly prepared from nineteen Chinese herbal medicines, the Chinese herbal medicines, folk prescription, Chinese herbal medicine pictures and traditional Chinese medicine effects and effects are stored, and names and aliases of various traditional Chinese medicines are collected.

After the Chinese character name field and the pinyin field of the Chinese medicinal materials are obtained, the Chinese character name field and the pinyin field can be stored in an index manner so as to form a database with index identification, which is convenient for subsequent quick search according to the index identification to improve the retrieval efficiency, as shown in the following steps S103 and S104.

S103, establishing a retrieval index for the Chinese character name field and the pinyin field of any Chinese medicinal material.

S104, establishing the medicinal material name retrieval library according to the Chinese character name field, the pinyin field and the retrieval index of any one of the traditional Chinese medicinal materials.

In this embodiment, the search index is a search identifier, which is associated with the storage addresses of the Chinese character name field and the pinyin field of the Chinese medicinal material; therefore, during retrieval, the Chinese character name field and the pinyin field matched with the retrieval identifier can be quickly positioned according to the retrieval identifier, so that quick searching and reading are realized, and the retrieval efficiency is improved.

Therefore, all names and corresponding pinyin fields of all the traditional Chinese medicine materials are imported and stored in the database, then the retrieval index of each medicine material is established, and finally the database with the retrieval index can be used as the medicine material name retrieval library.

In this embodiment, the hash value of the name field of the chinese character and the hash value of the pinyin field are associated with the storage address corresponding to the name of the medicinal material, so as to establish a corresponding index identifier; that is, the index includes index hash values, and the index hash values include hash values of the Chinese character name field and hash values of the Pinyin field.

For example, assume that the search identifier for Paris polyphylla is hash value "4" and its associated memory address is: the A1 node in the database-A11 memory cell-row A111, column B111; then, during the search, the a11 storage unit in the A1 node in the database can be quickly located according to the hash value "4", and the storage contents in the a111 th row and the B111 th column are the search result.

After a medicine name search library containing all the names of the medicines and the pinyin fields corresponding to all the names is established, the medicine can be searched, as shown in steps S2-S5.

And S2, acquiring a search field input by a user.

In step S2, the user inputs a search field in the search bar of the mall through the user terminal, so as to retrieve the name of the drug by using the search engine of the mall.

In this embodiment, the search field may include, but is not limited to: the keyword, pinyin corresponding to the keyword and pinyin corresponding to the keyword.

In this embodiment, if each chinese character in the search field input by the user is a traditional character, each chinese character is converted into a simplified character for subsequent recognition.

In the embodiment, in order to improve the retrieval accuracy, the search field needs to be converted into pinyin before retrieval, and retrieval is performed according to the pinyin and the search field, so that scenes of different voices and characters can be effectively covered, and the problem that a user cannot retrieve expected goods due to input of other characters is eliminated to the greatest extent; the implementation steps are shown as steps S3 and S4.

And S3, converting the search field into pinyin to obtain a search pinyin field.

In this embodiment, for example, but not limited to, a JPinyin tool is used to perform pinyin conversion; JPinyin is an open source class library for changing pinyin of Chinese characters, and has the main characteristics that:

(1) accurate and perfect word stock; the Unicode codes are from 20903 Chinese characters in the range of 4E00-9FA5 and 3007 (good quality), and JPinyin can convert all Chinese characters except 46 foreign body characters (the foreign body characters do not have standard pinyin);

(2) the pinyin conversion speed is high; tests show that 20902 Chinese characters with the Unicode code ranging from 4E00-9FA5 are converted, and JPinyin takes about 100 milliseconds;

(3) multi-pinyin format output support; JPinyin supports a plurality of pinyin output formats: the output format of the initial letter of the phonetic alphabet with phonetic alphabet, without phonetic alphabet, number representation phonetic alphabet and pinyin;

(4) identifying common polyphone characters; JPinyin supports the identification of common polyphones, wherein phrases, idioms and/or place names are included;

(5) and converting between simplified and traditional Chinese.

Of course, in this embodiment, if the search field input by the user is the pinyin field, the step S3 is not required, and the step S4 may be directly executed to perform the search.

And S4, searching the medicine name matched with the search pinyin field and the medicine name matched with the search pinyin field in the medicine name search library according to the search field and the search pinyin field, and taking the medicine name matched with the search field as a search result.

In the embodiment, during retrieval, the retrieval is carried out in a medicinal material name retrieval library according to a search field and a search pinyin field, namely, a medicinal material name matched with the search pinyin field and a medicinal material name matched with the search field are respectively retrieved; therefore, the Chinese character retrieval result and the pinyin retrieval result can be covered, so that the comprehensiveness of retrieval is improved, and the name of the medicinal material expected by a user can be retrieved.

One specific search method for obtaining search results by using the search field and the search pinyin field is provided below, and may include, but is not limited to, the following steps S401 to S403.

S401, calculating hash values of the search field and the search pinyin field to serve as search hash values.

S402, searching an index hash value matched with the search hash value in the medicinal material name retrieval library to serve as the search hash value.

And S403, taking the medicine material name in the storage address corresponding to the retrieval hash value as the retrieval result.

As described above, all names of the medicinal materials and the pinyin fields corresponding to the names are used for establishing the retrieval index based on the hash values, and therefore, the retrieval step is based on the hash values of the search fields and the search pinyin fields to realize rapid retrieval.

In this embodiment, for example, when the search hash value exists in the search hash value, the same hash value may be used as the search result corresponding to the name of the drug in the storage address.

The foregoing retrieval steps are illustrated below as an example:

assume that the hash value of cinnamon twig is: the hash values (3, 5 and 8) corresponding to the name field of the Chinese character and the hash values (7, 2 and 9) corresponding to the pinyin field indicate that the cassia twig has 3 names and 3 pinyin fields.

Suppose the hash value of paris polyphylla is: the hash values (1 and 4) corresponding to the name field of the Chinese character and the hash values (6 and 10) corresponding to the pinyin field indicate that the paris polyphylla has 2 names and 2 pinyin fields.

For example, the hash value corresponding to the search field input by the user is 6, and the hash value corresponding to the search pinyin field is 2; then, according to the steps, the hash value of the searched pinyin field is obtained to be in the hash value corresponding to the pinyin field of the cassia twig, so that the medicine name of the storage address corresponding to the hash value 2 can be used as a search result; the 3 names and 3 pinyin fields of the cassia twig are used as retrieval results; similarly, the hash value corresponding to the search field exists in the hash value corresponding to the name field of the chinese character of paris polyphylla, so that 2 names and 2 pinyin fields of paris polyphylla can also be output as the search result.

Therefore, by the retrieval method, when the same medicinal material has a plurality of different aliases, the user only needs to input a name known or familiar to the user for retrieval to obtain a desired retrieval result; meanwhile, the search field is converted into the Chinese pinyin and then searched, so that the scene of 'different sounds and characters' can be effectively covered, and the problem that the expected goods cannot be searched out due to the fact that the user inputs different characters can be eliminated to the greatest extent.

In this embodiment, the keyword input by the user may be a word or a word in the name; therefore, when the retrieval index is set, namely the retrieval hash value is set, the Chinese character name field and the pinyin field can be subjected to word segmentation processing, and then the hash value of each word after word segmentation is calculated to serve as the retrieval hash value; similarly, the hash value of each word can also be calculated; therefore, the retrieval hash value can cover the whole Chinese character name field, each Chinese character, each word in the Chinese character name field and each pinyin field, so that the user can be ensured to input only one word or character and can also retrieve the expected medicinal materials.

In this embodiment, after the retrieval result is obtained, in order to facilitate the user to view the medicine material name most closely associated with the search field preferentially, the following steps S404 and S405 are further provided to sort the retrieval results in order of the relevance from high to low.

S404, calculating the similarity between the medicinal material name corresponding to the retrieval result and the search field and the similarity between the medicinal material name corresponding to the retrieval result and the search pinyin field.

S405, sorting the retrieval results according to the sequence of the similarity from high to low.

The principle of step S404 and step S405 is: respectively calculating the similarity between the medicinal material name corresponding to the retrieval result and the search field and the similarity between the medicinal material name corresponding to the retrieval result and the search pinyin field, and sequencing the retrieval results according to the sequence of the similarity from high to low; that is, the greater the value of the similarity, the more relevant the medicine name corresponding to the search result and the search field or the search pinyin field, the higher the matching degree.

In the present embodiment, there are two kinds of calculated similarities, one is the similarity between the search result and the search field. The other is the similarity between the retrieval result and the search pinyin field; thus, when ordering by similarity, there are two ordering methods:

the first one is: the sorting is performed according to the magnitude of the values of the two similarities.

The second method is as follows: and calculating the total similarity according to the values of the two similarities, and sequencing according to the total similarity.

For the second type, assuming that the similarity between the search result and the search field is the first similarity (W1) and the similarity between the search result and the search pinyin field is the second similarity (W2), the total similarity is kW1+ cW2, where k and c are weighting coefficients, which may be but are not limited to preset for the user.

One of the methods for calculating the similarity between the search field and the drug name corresponding to the search result is provided as follows, as shown in steps S404 a-S404 g.

S404a, carrying out concept division on Chinese character name fields in the medicinal material names corresponding to the retrieval results to obtain a first concept set.

S404b, performing concept division on the search field to obtain a second concept set, wherein the first concept set and the second concept set both comprise a first basic semantic description, other basic semantic descriptions, a relation semantic description and a relation symbol description.

S404c, calculating the similarity between the first basic semantic description in the first concept set and the first basic semantic description in the second concept set to obtain the similarity of the first basic semantic description.

S404d, calculating the similarity between the descriptions of other basic sememes in the first concept set and the descriptions of other basic sememes in the second concept set to obtain the similarity of the descriptions of other basic sememes.

S404e, calculating the similarity between the relation semantic element description in the first concept set and the relation semantic element description in the second concept set to obtain the similarity of the relation semantic element description.

S4104f, calculating the similarity between the relation symbol description in the first concept set and the relation symbol description in the second concept set to obtain the similarity of the relation symbol descriptions.

S404g, according to the first basic semantic meaning description similarity, the other basic semantic meaning description similarities, the relation semantic meaning description similarity and the relation symbol description similarity, obtaining the similarity between the medicine name corresponding to the retrieval result and the search field.

The principle of the steps S404 a-S404 g is: calculating similarity according to the word semantics of the words, wherein the word semantics are described by using concepts, one word can be several concepts, and the semantic meaning is the minimum meaning unit for describing the concepts; in general, a term is expressed by using a feature structure that contains the following four features: a first base semantic description (whose value is one base semantic); other primitive descriptions (corresponding to all primitive descriptions in the semantic expression except the first primitive description, whose value is a set of primitives); a relational semantic description (corresponding to all relational semantic descriptors in the semantic expression, the value of which is a feature structure, for each feature of which the attribute is a relational semantic, the value of which is a basic semantic or a specific word); relational symbol descriptions (corresponding to all relational symbol descriptions in a semantic expression, the value of which is also a feature structure, for each feature of which the attribute is a relational semantic, the value is a set, the elements of which are a basic semantic, or a specific word); therefore, for two words, the overall similarity of the two words can be obtained only by calculating the similarity of the 4 feature structures.

For describing the word semantics by using concepts, one concept can be described by a series of original meanings, and the original meanings can be organized into a tree-like original meaning hierarchy by the upper and lower relations, so that the similarity of two words can be calculated by using the following formula, namely the similarity of the Chinese character name field and the search field in the medicine name corresponding to the search result.

Assuming that the first basic semantic description similarity is recorded as: sim1(Y1, Y2); other basic semantic description similarity is noted as: sim2(Y1, Y2); the relationship semantic description similarity is recorded as: sim3(Y1, Y2); the relationship notation descriptive similarity is noted as: sim4(Y1, Y2), wherein Y1 represents the search field, and Y2 represents the Chinese character name field in the medicine name corresponding to the search result.

Then, the similarity calculation formula for Y1 and Y2 is:

in the above formula, Sim (Y1, Y2) represents the similarity between Y1 and Y2, Simi(Y1, Y2) represents the similarity, β, of the i-th features in Y1 and Y2iIs the weight coefficient of the ith feature, and1234=1,β1≥β2≥β3≥β4

in this embodiment, βiCan be preset by the user.

In this embodiment, a calculation method for describing similarity by exemplifying the first basic meaning is as follows:

assuming that the search field Y1 has n concepts (Y11, Y12.. Y1 n), and the chinese character name field Y2 in the drug names corresponding to the search results has j concepts (Y21, Y22.. Y2m), the first basic semantic description similarity is calculated as:

in the above formula, n is the total number of concepts Y1, m is the total number of concepts Y2, and Y is1rRepresenting the r-th concept in the search field Y1, Y2jRepresenting the j-th concept in the chinese name field Y2.

Since, as explained above, a concept is described by an semantic, the calculation of similarity described by the first basic semantic may be changed to calculate the similarity between two semantic in a concept, and then the final result of the maximum similarity of the concept is taken, that is, the calculation formula is changed to:

in the above formula, P1,P2Each represents Y1rDefinition in concept and Y2jThe term "sememe" in the concept, d denotes the sememe P1And the sense P2Path length in the semantic hierarchy, α is a constant.

For the similarity described by other basic sememes, the value of which is one set, the similarity can be converted into the similarity calculation problem of two basic sememe sets, namely, the one-to-one correspondence between elements of the two sets is established, and the specific method is as follows:

the first step is as follows: firstly, calculating the similarity between every two elements of the two sets; the essence is as follows: and pairing the semantic meanings of the semantic expressions in the first concept set and the second concept set except the first basic semantic meaning, and calculating the similarity between the semantic meanings and the first basic semantic meaning.

The second step is that: selecting two other sememes with the maximum similarity value in the first step as the maximum similarity, and establishing a corresponding relation;

the third step: and deleting the similarity corresponding to other sememes with the established corresponding relation from the similarity obtained by the first step of calculation.

The fourth step: repeating the second step and the third step until all the similarities have been deleted, so as to obtain a maximum similarity at each cycle.

The fifth step: and summing the maximum similarity values obtained in each cycle, and averaging to obtain the similarity described by other basic sememes.

Similarity is described for relationship semaphores: the value of the feature is a feature structure, and the similarity of two feature structures can be calculated by conversion, wherein the features with the same attribute are in one-to-one correspondence, and if the features with the same attribute do not exist, the corresponding object of the feature is empty; thus, the similarity of the feature structure translates into a weighted average of the similarities of the individual features.

Similarly, the value of the similarity described by the relationship symbol is also a feature structure, and can also be converted into a weighted average of two feature structures; that is, the similarity can be described by referring to the first basic meaning in the calculation principle.

Finally, after the first basic semantic meaning description similarity, the other basic semantic meaning description similarities, the relation semantic meaning description similarity and the relation symbol description similarity are obtained, the formula can be calculated according to the similarity of the Y1 and the Y2, and the similarity between the medicine name corresponding to the retrieval result and the search field is obtained.

The following specific calculation method for providing the similarity between the medicine name corresponding to the search result and the search pinyin field may include, but is not limited to, the following steps S401h to S401k.

S401h, acquiring the total number of first letters, wherein the total number of the first letters is the number of letters existing in the pinyin field in the medicine material name corresponding to the retrieval result and the pinyin searching field.

And S401i, obtaining the total number of second letters, wherein the total number of the second letters is the number of letters which exist in the pinyin field and do not exist in the pinyin search field.

S401j, obtaining the total number of third letters, wherein the total number of the third letters is the number of letters which exist in the pinyin searching field and do not exist in the pinyin searching field.

And S401k, obtaining the similarity between the medicine material name corresponding to the retrieval result and the search pinyin field according to the total number of the first letters, the total number of the second letters and the total number of the third letters.

The foregoing steps S401 h-S401 k are described by a formula as follows:

U(S1,S2)=b1q/(b2q+b3r+b4s)

in the formula, S1 represents a search pinyin field, S2 represents a pinyin field in the medicine name corresponding to the search result, q represents the total number of first letters, r represents the total number of third letters, and S represents the total number of second letters; u (S1, S2) represents the similarity of S1 and S2; and b1, b2, b3 and b4 respectively represent the weight coefficient of the total number of the first letter, the weight coefficient of the total number of the third letter and the weight coefficient of the total number of the second letter (preset by the user into the search engine).

After the two similarity values are obtained, the search results may be sorted according to the first sorting method or the second sorting method, and returned to the front end for displaying, as shown in step S5.

And S5, returning the retrieval result to the front end for displaying.

In this embodiment, during the presentation, the search results of the top three ranked bits may be highlighted, for example, a red font is used for display, so as to increase the recognition degree.

Therefore, through the method for searching Chinese medicinal materials described in detail in the foregoing steps S1-S5, when the same Chinese medicine is called differently, the user only needs to input the name known or familiar to the user for searching to obtain the expected search result; meanwhile, the search field is converted into the Chinese pinyin and then searched, so that the scene of 'different sounds and characters' can be effectively covered, and the problem that the expected goods cannot be searched out due to the fact that the user inputs different characters can be eliminated to the greatest extent.

As shown in fig. 3, a second aspect of this embodiment provides a hardware device for implementing the method for retrieving traditional Chinese medicine materials in the first aspect of this embodiment, including: the device comprises a database establishing unit, an obtaining unit, a converting unit, a searching unit and a transmitting unit.

The database establishing unit is used for establishing a medicinal material name retrieval library, wherein any medicinal material name in the medicinal material name retrieval library comprises a Chinese character name field and a pinyin field corresponding to the Chinese character name field, and the Chinese character name field comprises all medicinal material names for representing the same medicinal material.

The acquisition unit is used for acquiring the search field input by the user.

And the conversion unit is used for converting the search field into pinyin to obtain a search pinyin field.

The searching unit is used for searching the medicine name matched with the searching pinyin field and the medicine name matched with the searching pinyin field in the medicine name searching library according to the searching field and the searching pinyin field, and the searching result is used as a searching result.

And the transmission unit is used for returning the retrieval result to the front end for displaying.

For the working process, the working details, and the technical effects of the hardware apparatus provided in this embodiment, reference may be made to the first aspect of the embodiment, which is not described herein again.

As shown in fig. 4, a third aspect of the present embodiment provides an electronic device, including: the system comprises a memory, a processor and a transceiver which are sequentially communicated, wherein the memory is used for storing a computer program, the transceiver is used for receiving and sending messages, and the processor is used for reading the computer program and executing the retrieval method of the traditional Chinese medicine materials according to the first aspect of the embodiment.

For example, the Memory may include, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Flash Memory (Flash Memory), a First In First Out (FIFO), and/or a First In Last Out (FILO), and the like; the processor may not be limited to a microprocessor of a model number STM32F105 series, a reduced instruction set computer (RSIC) microprocessor, an architecture processor such as X86, or a processor integrated with a neural-Network Processing Unit (NPU); the transceiver may be, but is not limited to, a wireless fidelity (WIFI) wireless transceiver, a bluetooth wireless transceiver, a General Packet Radio Service (GPRS) wireless transceiver, a ZigBee wireless transceiver (ieee802.15.4 standard-based low power local area network protocol), a 3G transceiver, a 4G transceiver, and/or a 5G transceiver, etc. In addition, the device may also include, but is not limited to, a power module, a display screen, and other necessary components.

For the working process, the working details, and the technical effects of the computer main device provided in this embodiment, reference may be made to the first aspect of the embodiment, which is not described herein again.

A fourth aspect of the present embodiment provides a computer-readable storage medium storing instructions for implementing the method for retrieving traditional Chinese medicine materials according to the first aspect of the present embodiment, that is, the computer-readable storage medium stores instructions that, when executed on a computer, perform the method for retrieving traditional Chinese medicine materials according to the first aspect.

The computer-readable storage medium refers to a carrier for storing data, and may include, but is not limited to, floppy disks, optical disks, hard disks, flash memories, flash disks and/or Memory sticks (Memory sticks), etc., and the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.

For the working process, the working details, and the technical effects of the computer-readable storage medium provided in this embodiment, reference may be made to the first aspect of the embodiment, which is not described herein again.

A fifth aspect of the present embodiment provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method for retrieving traditional Chinese medicine materials according to the first aspect of the present embodiment, wherein the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.

Finally, it should be noted that: the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种页面加载方法、装置、电子设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!