Search engine and search method based on compound

文档序号:488978 发布日期:2022-01-04 浏览:41次 中文

阅读说明:本技术 一种基于化合物的搜索引擎和搜索方法 (Search engine and search method based on compound ) 是由 常闻宇 于 2021-10-15 设计创作,主要内容包括:本申请涉及一种基于化合物的搜索引擎和搜索方法,化合物搜索引擎包括:化合物搜索模块,用于接收搜索描述并将其转化为搜索标识,化合物本地缓存模块和本地存储模块用于根据搜索标识在私有云上进行搜索,公共云化合物缓存模块和公共云化合物存储模块用于根据搜索标识在公有云上进行搜索,搜索结构都更新至化合物本地缓存模块;搜索方法是根据用户的搜索描述,搜索引擎先在本地模块上匹配是否有用户请求的信息,若没有匹配,则通过公共云模块进行匹配,实现对常用搜索目标物进行高效返回。本申请通过采用混合云的缓存机制高效提高搜索效率,并缓解了存储压力,可广泛应用于混合云的高效使用。(The present application relates to a compound-based search engine and a search method, the compound search engine comprising: the compound searching module is used for receiving the searching description and converting the searching description into a searching identifier, the compound local cache module and the local storage module are used for searching on the private cloud according to the searching identifier, the public cloud compound cache module and the public cloud compound storage module are used for searching on the public cloud according to the searching identifier, and the searching structure is updated to the compound local cache module; according to the search method, according to the search description of the user, the search engine is matched with the information whether the user requests or not on the local module, if not, the public cloud module is used for matching, and the purpose of efficiently returning the common search target object is achieved. According to the method and the device, the searching efficiency is efficiently improved by adopting the cache mechanism of the hybrid cloud, the storage pressure is relieved, and the method and the device can be widely applied to efficient use of the hybrid cloud.)

1. A compound search engine, comprising:

the compound searching module receives an input searching description and converts the searching description into a searching identifier;

the compound local cache module searches in the compound local memory according to the search identifier to obtain a local cache search result and caches the local cache search result;

the compound local storage module searches in the compound local storage according to the search identifier to obtain a local storage search result and returns the local storage search result to the compound local cache module;

the public cloud compound cache module is used for searching in a public cloud compound memory according to the search identifier to obtain a public cloud compound cache search result and returning the public cloud compound cache search result to the compound local cache module and the local compound storage module;

and the public cloud compound storage module is used for searching in the public cloud compound storage according to the search identifier to obtain a public cloud compound storage search result and returning the public cloud compound storage search result to the compound local cache module, the local compound storage module and the public cloud compound cache module.

2. The search engine of claim 1, further comprising a search frequency detection server coupled to the compound local cache module and to corresponding search information to test its search frequency over a period of time.

3. The search engine of claim 2, wherein the compound local cache module automatically clears the information of the corresponding compound with low search frequency or 0 search frequency according to the result detected by the search frequency detection server.

4. The search engine of claim 1, wherein the search description comprises a compound ID, a compound name, a compound alias, a compound CAS number, a compound molecular formula, a compound molecular weight, a compound inci string, a compound SMILES string, a compound label, a class to which the compound belongs, and a compound attribute group.

5. The search engine of claim 1, wherein the compound local cache module, the compound local storage module, the public cloud compound cache module, and the public cloud compound storage module comprise a plurality of compounds and attributes thereof, wherein the attributes of the compounds comprise compound molecular weight, compound molecular formula, compound structural formula, compound name, compound CAS number, compound density, compound profile/mass spectral information, compound toxicology information, chemical attributes, biological test data, and synthetic routes.

6. The search engine of claim 1, wherein the search token comprises a literal precise search token, a literal fuzzy search token, and a corresponding search weight token calculated from a compound formula.

7. The search engine of claim 1, wherein translating the search description into the search identifier comprises filtering the search description to filter out content that is not system-compatible; and performing input cleaning on the search description, namely feeding back a search result if no matched search identifier exists.

8. The search engine of claim 1, wherein after receiving the returned search result in the first search, the user terminal searches for a similar compound for the second time, and then the system quickly retrieves and feeds back all information in the first search result from the compound local cache module.

9. The search engine of claim 1, wherein the compound public cloud storage module is provided with a public cloud data interface and a unified API interface for business cloud data.

10. A compound-based search method, using the search engine of claims 1-8, comprising the steps of:

1) the compound searching module receives the searching description and converts the searching description into the searching identifier;

2) according to the search identification, the system firstly searches in the compound local memory in the compound local cache module, and the obtained compound local cache search result is returned to the user side and cached;

3) when no result is fed back in the step 2, the compound local storage module searches in the compound local storage according to the search identifier, and the obtained compound local storage search result is returned to the user and returned to the compound local cache module;

4) when no search result is fed back in the step 3, the system automatically searches in the compound public cloud cache module according to the search identifier, and the obtained compound public cloud cache search result is returned to the user side and uploaded to the compound local cache module and the compound local storage module;

5) when no search result is fed back in the step 4, the compound public cloud cache module searches in the compound public cloud storage according to the search identifier, and the obtained search result of the compound public cloud storage is returned to the user side and uploaded to the compound public cloud cache module, the compound local cache module and the compound local storage module.

Technical Field

The application relates to the technical field of data search, in particular to a compound-based search engine and a search method.

Background

Medical health is always a focus of social civilian attention, is the central importance of national development and people health, and chemical experiments are an important means for promoting medical health to obtain breakthrough progress. With the continuous development of economy and network technology, more and more industries are gradually changed from the traditional industry to the 'internet +', the internet and the traditional industry are fused by means of a network information platform, and a new development opportunity is created by utilizing the advantage characteristics of the internet, so that faster and more convenient service is provided for enterprises. As various informatization solutions appear in the market, enterprises in the medical health field are helped to research, develop and manage more efficiently.

In all systems, compound search is an attribute common to all systems due to industry characteristics. Then, various compound libraries are available on the market at present, and the enterprise has a compound library with confidential property, so that how to effectively manage the two part libraries becomes a problem shared by all systems.

The current common practice is relatively inefficient, and a user or a system needs to search for the outside and the inside separately at the same time, so that the searching efficiency is low; in addition, because the searches are separated, system optimization cannot be achieved, and searched compounds cannot be cached for the next search.

Therefore, there is a need to provide a chemical industry-based search method, which can implement compound search, and effectively increase the search rate and relieve the storage pressure.

Disclosure of Invention

The technical problem that this application will solve is that user or system need separately search to outside and inside simultaneously in prior art, causes the search efficiency low like this, in addition because search separately, can't accomplish system optimization, the search of being convenient for next time.

To solve the above technical problem, according to an aspect of the present application, there is provided a compound-based search engine including: the compound searching module receives the input searching description and converts the searching description into a searching identifier; the compound local cache module searches in the compound local memory according to the search identifier to obtain a local cache search result and caches the local cache search result; the compound local storage module is used for searching in the compound local storage according to the search identifier to obtain a local storage search result and returning the local storage search result to the compound local cache module; the public cloud compound cache module is used for searching in the public cloud compound memory according to the search identifier to obtain a public cloud compound cache search result and returning the public cloud compound cache search result to the compound local cache module and the local compound storage module; and the public cloud compound storage module is used for searching in the public cloud compound storage according to the search identifier to obtain a public cloud compound storage search result and returning the public cloud compound storage search result to the compound local cache module, the local compound storage module and the public cloud compound cache module.

According to an embodiment of the application, the search engine further comprises a search frequency detection server, connected to the compound local cache module and connected to the corresponding search information for testing its search frequency over a period of time. According to the result detected by the search frequency detection server, the compound local cache module automatically clears the information of the corresponding compound with low search frequency or 0 search frequency.

According to embodiments of the present application, a search description for a compound may include a compound ID, a compound name, a compound alias, a compound CAS number, a molecular formula of the compound, a molecular weight of the compound, an INCHI string of the compound, a SMILES string of the compound, a label of the compound, a class to which the compound belongs, and an attribute group of the compound.

According to an embodiment of the application, the compound local caching module, the compound local storage module, the public cloud compound caching module, and the public cloud compound storage module include a plurality of compounds and attributes thereof, wherein the attributes of the compounds include compound molecular weight, compound molecular formula, compound structural formula, compound name, compound CAS number, compound density, compound profile/mass spectrum information, compound toxicology information, chemical attributes, biological test data, and synthetic route.

According to the embodiment of the application, the search identifier can comprise a text accurate search identifier, a text fuzzy search identifier and a corresponding search weight identifier calculated according to a compound structural formula.

According to embodiments of the present application, converting a compound search description to a search identifier may include filtering the search description to filter out content that cannot be matched with the system; and (4) performing input cleaning on the search description, namely feeding back a search result if no matched search identifier exists.

According to the embodiment of the application, after the client side receives the returned search result in the first search, when the client side searches for similar compounds for the second time, the system quickly calls and feeds back all information in the first search result from the compound local cache module.

According to an embodiment of the application, the compound public cloud storage module may further be provided with a public cloud data interface and a unified API interface for business cloud data.

According to another aspect of the present application, there is provided a compound-based search method, using the above search engine, the search method comprising the following steps:

1) the method comprises the steps that a user side inputs search description, and a compound search module receives the search description and converts the search description into a search identifier;

2) according to the search identification, the system firstly searches in a compound local memory by a compound local cache module, and the obtained compound local cache search result is returned to the user side and cached;

3) when no search result is fed back in the step 2, the compound local storage module searches in the compound local storage according to the search identification, and the obtained compound local storage search result is returned to the user and uploaded to the compound local cache module;

4) when no search result is fed back in the step 3, the system automatically searches in the compound public cloud cache module according to the search identifier, and the obtained compound public cloud cache search result is returned to the user side and uploaded to the compound local cache module and the compound local storage module;

5) and when no search result is fed back in the step 4, the compound public cloud cache module searches in the compound public cloud storage according to the search identifier, and the obtained compound public cloud storage search result is returned to the user side and uploaded to the compound public cloud cache module, the compound local cache module and the compound local storage module.

Compared with the prior art, the invention has the following beneficial effects:

1. the capacity of an internal high-speed storage medium is used as a buffer area of mixed cloud storage, so that the cache capacity is enlarged, the contents searched by an internal database and an external database are unified, a commonly used searching compound is efficiently returned, and the efficiency of accessing data by a user is obviously improved;

2. a dynamic cache mechanism is adopted, a search frequency detection server is designed, so that the system can automatically clear away the information of the compounds which are not used frequently, and the resource consumption is reduced; the cache region data is dynamically updated in real time according to the change of the search frequency, the speed of the intranet user accessing the original common data on the public cloud is increased, the risk of losing the common data on the public cloud is reduced, and meanwhile, the access cost of bandwidth and flow is reduced.

3. The API interface and the unified interface of the application program are adopted, multiple data sources are allowed to be connected, the connection of various commercial databases is supported, the data of the multiple databases are uniformly classified and sorted, unified structure data are returned, and the unified comparison and management of the upper layer of the system are facilitated.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description only relate to some embodiments of the present application and are not limiting on the present application.

FIG. 1 is a schematic diagram of a compound-based search engine provided by an embodiment of the present invention;

fig. 2 is a flow chart of a compound-based search method provided by an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings of the embodiments of the present application. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the application without any inventive step, are within the scope of protection of the application.

Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. The use of "first," "second," and similar terms in the description and claims of this patent application do not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. Also, the use of the terms "a" or "an" and the like do not denote a limitation of quantity, but rather denote the presence of at least one.

FIG. 1 is a block diagram of the components of a compound-based search engine, according to one embodiment of the present invention. As shown in fig. 1, the compound-based hybrid cloud search engine includes:

and the compound searching module is used for receiving a searching request input by the user side, matching the searching request and converting the searching request into a searching identifier which can be identified by the system, and also receiving a searching result of a searching engine and returning the searching result to the user side.

Further, the user terminal may be a PC (Personal Computer), a mobile terminal, and other application program interfaces for displaying the search results returned by the search engine. The mobile terminal can be a mobile phone, a tablet computer and other hardware devices with various operating systems.

Furthermore, the search request input by the user side is matched, filtering and input cleaning can be performed according to the data of the search request, and the data are ensured to be complete and meet the format requirements of subsequent modules. The filtering is used for filtering out the part which cannot be matched with the search system, and the input cleaning is used for feeding back the search result when the part which cannot be matched with the search system does not exist.

Further, the search identifier can be divided into a precise text search identifier, a fuzzy text search identifier and a structural search weight identifier, for example, if the search request input by the user is a compound CAS number or a compound name, the target object search module converts the search request into a precise text search identifier corresponding to the system; when the search request input by the user is fuzzy search of a compound name, the target object search module converts the search request into a fuzzy search identifier corresponding to the system; the search request input by the user is a compound structure search, the system firstly converts the structure information into a structural formula which can be identified by the system, and then calculates a corresponding search weight identifier through the structure.

And the compound local cache module is used for searching in the local memory according to the search identifier and obtaining a local cache search result, and the local cache module can also store the local cache search result and send the local cache search result to the user side.

Furthermore, the compound local cache module supports local storage, but most of the data exists in the local memory, and the data in the local memory mainly refers to data searched by the user before and results of the system automatically searching in advance according to the relevance of the target object. The local cache module also supports cache of a local database and data cache of a public database at the same time. Thus, the local caching module has a preset high-speed storage media capacity of the private cloud. In this embodiment, the compound local cache module also has an automatic cleaning function, so as to save limited high-speed storage medium resources and ensure the utilization rate of the limited high-speed storage medium resources. And cleaning some target object information which is low in searching frequency or is not searched for a long time so as to reduce the consumption of local storage capacity. For example, cache information in the local cache module can be intelligently cleaned according to the search frequency of a user for searching a target object, a search frequency detection server is specifically designed, the search frequency of the target object is counted and received according to a preset time interval, and the related information of the target object which is low in search frequency or has no object searched in the set time interval is automatically cleaned.

Further, the specific choice of the storage medium of the compound local cache module may be set by the designer, which is not limited in this embodiment.

A compound local storage module comprising a local storage library. Currently, the mainstream databases are classified into relational databases and non-relational databases, and the local repository may be a relational or non-relational database for storing local information. And the compound local storage module is used for searching in the local storage library according to the search identification and obtaining a local storage search result.

Further, the compound local storage module also supports fuzzy search and accurate search of multiple targets. For example, if the search identifier is a literal precise search, the compound local storage module performs a precise search in the local storage; and when the search identifier is the fuzzy search identifier corresponding to the system, the compound local storage module performs fuzzy search in the local storage. The search result of the compound local storage module is returned and stored to the local cache module.

And the public cloud compound cache module is used for searching in the public cloud memory according to the search identifier and obtaining a public cloud cache search result, and the public cloud cache search result is returned and stored to the local cache module and the local storage module.

Furthermore, the public cloud compound cache module can identify a character accurate search identifier, a character fuzzy search identifier and a structure search weight identifier, and relevant searched data can be directly returned to a user side.

Further, the public cloud compound caching module supports remote calling of business cloud data and formats returned data.

And the public cloud storage module is used for searching in the public cloud storage according to the search identifier and obtaining a public cloud storage search result, and the public cloud storage search result is returned and stored to the local cache module and the local storage module.

Further, the public cloud compound storage module is provided with a public cloud data interface and a corresponding commercial cloud data universal interface. The business databases 1, 2 and 3 … can be connected with the public cloud compound storage module through uniform API interfaces, so that the search engine supports remote calling of business cloud data, formatting processing is performed on returned data, uniform classification and sorting of multi-database data are realized, uniform structural data are returned, and uniform comparison and management of the upper layer of the system are facilitated.

Referring to fig. 2, fig. 2 is a compound-based search method according to an embodiment of the present invention, where the method uses the search engine according to the above embodiment, and the method may include the following specific steps:

1) the method comprises the steps that a user side inputs search description, and a compound search module receives the search description and converts the search description into a search identifier;

2) according to the search identification, the system firstly searches in a compound local memory by a compound local cache module, and returns an obtained compound local cache search result to the user side and caches the compound local cache search result;

3) when no search result is fed back in the step 2, the compound local storage module searches in the compound local storage according to the search identification, and the obtained compound local storage search result is returned to the user and uploaded to the compound local cache module;

4) when no search result is fed back in the step 3, the system automatically searches in the compound public cloud cache module according to the search identifier, and the obtained compound public cloud cache search result is returned to the user side and uploaded to the compound local cache module and the compound local storage module;

5) and when no search result is fed back in the step 4, the compound public cloud cache module searches in the compound public cloud storage according to the search identifier, and the obtained compound public cloud storage search result is returned to the user side and uploaded to the compound public cloud cache module, the compound local cache module and the compound local storage module.

In summary, the search method of the application can accumulate and update the search results to the compound local cache module through one search, so that the local cache can return the search results at high speed when similar compounds are searched next time.

Further, the cloud cache mechanism-based hybrid cloud search method set forth in the embodiments of the present invention further includes an automatic clearing step of the local cache module, which is to periodically clear the relevant information with low or no search frequency, so as to ensure the storage capacity of the local cache.

In this embodiment, the following search method for performing compound search by using a hybrid cloud search engine based on a dynamic cache mechanism is specifically described:

1. the user inputs the CAS number of the compound or the Chinese and English name of the standard name of the compound or the structural formula of the compound or the Chinese and English alias of the compound and the like, and can select accurate search or fuzzy search. The target object searching module converts the search conditions of the user into search identifiers which can be identified by the system: if the search is the CAS number search, the search is a simple character accurate search identifier; if the compound name search is carried out, accurate search or fuzzy search can be selected, and correspondingly, the system converts the compound name search into a simple character accurate search identifier or a corresponding fuzzy search identifier; if the compound structure is searched, the target object searching module firstly converts the structure information into a system recognizable structural formula and then calculates a corresponding searching weight identifier through the structure;

2. the local cache module searches in a local cache system according to the obtained search identifier: if the relevant data is searched in the local cache, namely the local memory, the search result is directly received by the target object search module and then returned to the front-end user side, and the search result is stored to update the cache mechanism; if the local cache system does not search the related data, the system searches the related data from the local storage data, namely the local sub-storage, the locally stored search result is received by the target object search module and directly returned to the front-end user side, and the locally stored search result is transmitted to the local cache module, so that the content of the local cache, namely the local memory, is increased;

3. if no data matching the compound is searched at both the local cache module and the local storage module, the system automatically searches through a public cloud connected to the hybrid cloud: according to the search identification of the target object search module, the system firstly searches related data in a public cloud cache, namely a public cloud memory, the public cloud cache search result is received by the target object search module and directly returned to a front-end user, the search result is stored to update a public cloud cache mechanism, and meanwhile, the public cloud cache search result is transmitted to a local cache module and a local storage module, and public cloud cache content is stored in a local storage;

4. if no relevant data is searched according to the operations of the step 2 and the step 3, the system automatically searches relevant data from the public cloud storage, the public cloud storage search result is received by the target object search module and directly returned to the front-end user side, the public cloud storage search result is sent to the public cloud cache module, and meanwhile, the public cloud storage search result is also sent to the local cache module and the local storage module, namely, the search process is finished. And the compound local cache module accumulates and updates the search result at this time, so as to prepare for the next search of similar compounds.

The search engine returns relevant search results after the system performs precise search or fuzzy search according to a compound search request input by the client, the results may include basic information of the compound, such as molecular weight, molecular formula, structural formula, name, CAS number, density, and the like, and further more information of the compound may be obtained according to the compound, such as spectrum/mass spectrum information, toxicological information, chemical attributes, biological test data, synthetic routes, and the like.

In conclusion, the invention has the following beneficial effects:

1. the capacity of an internal high-speed storage medium is used as a buffer area of mixed cloud storage, so that the cache capacity is enlarged, the contents searched by an internal database and an external database are unified, a commonly used searching compound is efficiently returned, and the efficiency of accessing data by a user is obviously improved;

2. a dynamic cache mechanism is adopted, a search frequency detection server is designed, so that the system can automatically clear away the information of the compounds which are not used frequently, and the resource consumption is reduced; the cache region data is dynamically updated in real time according to the change of the search frequency, the speed of the intranet user accessing the original common data on the public cloud is increased, the risk of losing the common data on the public cloud is reduced, and meanwhile, the access cost of bandwidth and flow is reduced.

3. The API interface and the unified interface of the application program are adopted, multiple data sources are allowed to be connected, the connection of various commercial databases is supported, the data of the multiple databases are uniformly classified and sorted, unified structure data are returned, and the unified comparison and management of the upper layer of the system are facilitated.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The above description is only exemplary of the present application and is not intended to limit the scope of the present application, which is defined by the appended claims.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种多晶材料中晶粒取向关系的运算方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!