Method, device, equipment and medium for comprehensively retrieving information of data warehouse

文档序号:190529 发布日期:2021-11-02 浏览:25次 中文

阅读说明:本技术 一种数据仓库全面检索信息的方法、装置、设备和介质 (Method, device, equipment and medium for comprehensively retrieving information of data warehouse ) 是由 邹丹 王喆 张晓宇 徐贵红 王沛然 孙思齐 于 2021-07-01 设计创作,主要内容包括:本发明提供了一种数据仓库全面检索信息的方法、装置、设备和介质,其中,该方法包括:将输入的检索信息与主数据属性文本进行匹配,得到与检索信息相匹配的目标信息;其中,主数据属性文本是根据主数据属性信息的历史变更记录建立的数据链路;将目标信息作为检索信息输入数据仓库进行检索,得到检索结果。根据主数据属性信息的历史变更记录建立的数据链路,将汇集至数据服务平台的数据与主数据关联,使主数据生命周期中的多个版本数据可以被有效利用,根据数据链路得到目标信息,将目标信息作为检索信息进行检索,使检索全面而高效。(The invention provides a method, a device, equipment and a medium for comprehensively retrieving information of a data warehouse, wherein the method comprises the following steps: matching the input retrieval information with the main data attribute text to obtain target information matched with the retrieval information; the main data attribute text is a data link established according to the history change record of the main data attribute information; and inputting the target information as retrieval information into a data warehouse for retrieval to obtain a retrieval result. The data link established according to the history change record of the attribute information of the main data associates the data collected to the data service platform with the main data, so that a plurality of version data in the life cycle of the main data can be effectively utilized, the target information is obtained according to the data link, and the target information is used as retrieval information for retrieval, so that the retrieval is comprehensive and efficient.)

1. A method for comprehensively retrieving information of a data warehouse is characterized by comprising the following steps:

matching input retrieval information with a main data attribute text to obtain target information matched with the retrieval information; the main data attribute text is a data link established according to the history change record of the main data attribute information;

and inputting the target information as retrieval information into a data warehouse for retrieval to obtain a retrieval result.

2. The method for the data warehouse to comprehensively retrieve the information, according to claim 1, wherein each data link in the main data attribute text corresponds to one main data, and each node in the data link comprises an attribute value of the main data, a starting time of the attribute value and an expiration time of the attribute value.

3. The method for comprehensively retrieving information of a data warehouse according to claim 2, wherein the matching the input retrieval information with the main data attribute text to obtain the target information matched with the retrieval information comprises:

matching the keywords in the retrieval information with the attribute values of the data links in the main data attribute text to obtain the data links matched with the keywords;

and matching the time range in the retrieval information with the determined starting time and/or ending time of the data link, and determining an attribute value matched with the time range to serve as the target information.

4. The method for comprehensively retrieving information of a data warehouse according to claim 1, wherein the matching the input retrieval information with the main data attribute text to obtain the target information matched with the retrieval information comprises:

matching the retrieval information with the main data attribute text based on word vector similarity matching to obtain main data attribute information matched with the retrieval information as the target information; alternatively, the first and second electrodes may be,

and matching the retrieval information with the main data attribute text based on elastic search to obtain main data attribute information matched with the retrieval information as the target information.

5. The method for comprehensively retrieving information of a data warehouse according to claim 4, wherein the matching of the inputted retrieval information with the main data attribute text to obtain the target information matched with the retrieval information further comprises:

judging whether the retrieval information is matched with the main data attribute text or not to obtain main data attribute information matched with the retrieval information;

and if the main data attribute information matched with the retrieval information is not obtained, matching the retrieval information with a pre-established synonym table to obtain a synonym matched with the retrieval information, and taking the synonym as the target information.

6. An apparatus for comprehensively retrieving information from a data warehouse, comprising:

the matching module is used for matching the input retrieval information with the main data attribute text to obtain target information matched with the retrieval information; the main data attribute text is a data link established according to the history change record of the main data attribute information;

and the retrieval module is used for inputting the target information as retrieval information into a data warehouse for retrieval to obtain a retrieval result.

7. The apparatus for comprehensively retrieving information of a data warehouse according to claim 6, wherein the matching module comprises:

the first matching unit is used for matching the keywords in the retrieval information with the attribute values of the data links in the main data attribute text to obtain the data links matched with the keywords;

and the second matching unit is used for matching the time range in the retrieval information with the determined starting time and/or the determined ending time of the data link, and determining the attribute value matched with the time range as the target information.

8. The apparatus for fully retrieving information from a data warehouse as claimed in claim 6, wherein the matching module further comprises:

a third matching unit, configured to match the search information with the main data attribute text based on word vector similarity matching, to obtain main data attribute information matched with the search information, and use the main data attribute information as the target information; alternatively, the first and second electrodes may be,

a fourth matching unit, configured to match the search information with the main data attribute text based on elastic search, to obtain main data attribute information matched with the search information, and use the main data attribute information as the target information;

the judging unit is used for judging whether the retrieval information is matched with the main data attribute text or not to obtain main data attribute information matched with the retrieval information;

and the fifth matching unit is used for matching the retrieval information with a pre-established synonym table to obtain a synonym matched with the retrieval information as the target information if the main data attribute information matched with the retrieval information is not obtained.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method of comprehensively retrieving information from a data repository as claimed in any one of claims 1 to 5.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of a method for a data warehouse to retrieve information comprehensively according to any one of claims 1-5.

Technical Field

The invention relates to the field of retrieval, in particular to a method, a device, equipment and a medium for comprehensively retrieving information of a data warehouse.

Background

Data Warehouse (DW or DWH for short) is a strategic collection that provides all types of Data support for all levels of decision-making processes of an enterprise. Data in the data warehouse is organized according to a certain theme domain, a theme refers to a key point concerned by a user when the user makes a decision by using the data warehouse, and one theme is usually related to a plurality of operation type information systems. The railway data service platform is currently used as a unique data warehouse platform of the railway, and collects data of a plurality of important business systems of the railway.

Due to the fact that system construction standardization and data standardization are insufficient, data collected to a data service platform cannot be associated with main data, multiple versions of data in the life cycle of the main data cannot be effectively utilized, data retrieval is lack of effectiveness, and data retrieval is not comprehensive.

In order to solve the problem of non-standard railway service data, data cleaning needs to be carried out before data collection work. However, data cleansing based on standards is time consuming and labor intensive, and cannot respond to rapidly changing business needs.

Disclosure of Invention

The invention provides a method, a device, equipment and a medium for comprehensively retrieving information of a data warehouse, which are used for solving the defects that data retrieval lacks effectiveness and data retrieval is incomplete because data collected to a data service platform cannot be associated with main data in the prior art, and can realize comprehensive and effective retrieval of the data.

In a first aspect, the present invention provides a method for comprehensively retrieving information from a data warehouse, including: matching input retrieval information with a main data attribute text to obtain target information matched with the retrieval information; the main data attribute text is a data link established according to the history change record of the main data attribute information; and inputting the target information as retrieval information into a data warehouse for retrieval to obtain a retrieval result.

According to the method for comprehensively retrieving the information of the data warehouse, each data link in the main data attribute text corresponds to one main data, and each node in the data link comprises the attribute value of the main data, the starting time of the attribute value and the deadline of the attribute value.

According to the method for comprehensively retrieving information of the data warehouse, provided by the invention, the input retrieval information is matched with the main data attribute text to obtain the target information matched with the retrieval information, and the method comprises the following steps: matching the keywords in the retrieval information with the attribute values of the data links in the main data attribute text to obtain the data links matched with the keywords; and matching the time range in the retrieval information with the determined starting time and/or ending time of the data link, and determining an attribute value matched with the time range to serve as the target information.

According to the method for comprehensively retrieving information of the data warehouse, provided by the invention, the input retrieval information is matched with the main data attribute text to obtain the target information matched with the retrieval information, and the method comprises the following steps: matching the retrieval information with the main data attribute text based on word vector similarity matching to obtain main data attribute information matched with the retrieval information as the target information; or, based on the elastic search, matching the retrieval information with the main data attribute text to obtain main data attribute information matched with the retrieval information, and using the main data attribute information as the target information.

According to the method for comprehensively retrieving information of the data warehouse, provided by the invention, the input retrieval information is matched with the main data attribute text to obtain the target information matched with the retrieval information, and the method further comprises the following steps: judging whether the retrieval information is matched with the main data attribute text or not to obtain main data attribute information matched with the retrieval information; and if the main data attribute information matched with the retrieval information is not obtained, matching the retrieval information with a pre-established synonym table to obtain a synonym matched with the retrieval information, and taking the synonym as the target information.

In a second aspect, the present invention provides an apparatus for comprehensively retrieving information from a data warehouse, including: the matching module is used for matching the input retrieval information with the main data attribute text to obtain target information matched with the retrieval information; the main data attribute text is a data link established according to the history change record of the main data attribute information; and the retrieval module is used for inputting the target information as retrieval information into a data warehouse for retrieval to obtain a retrieval result.

According to the apparatus for comprehensively retrieving information from a data warehouse provided by the present invention, the matching module comprises: the first matching unit is used for matching the keywords in the retrieval information with the attribute values of the data links in the main data attribute text to obtain the data links matched with the keywords; and the second matching unit is used for matching the time range in the retrieval information with the determined starting time and/or the determined ending time of the data link, and determining the attribute value matched with the time range as the target information.

According to the apparatus for comprehensively retrieving information from a data warehouse provided by the present invention, the matching module further comprises: a third matching unit, configured to match the search information with the main data attribute text based on word vector similarity matching, to obtain main data attribute information matched with the search information, and use the main data attribute information as the target information; or, the fourth matching unit is configured to match the retrieval information with the main data attribute text based on elastic search, to obtain main data attribute information matched with the retrieval information, and to use the main data attribute information as the target information; the judging unit is used for judging whether the retrieval information is matched with the main data attribute text or not to obtain main data attribute information matched with the retrieval information; and the fifth matching unit is used for matching the retrieval information with a pre-established synonym table to obtain a synonym matched with the retrieval information as the target information if the main data attribute information matched with the retrieval information is not obtained.

In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for fully retrieving information from the data repository when executing the program.

In a fourth aspect, the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method for a data repository to retrieve information comprehensively.

The invention provides a method, a device, equipment and a medium for comprehensively retrieving information of a data warehouse, which are characterized in that target information matched with the retrieved information is obtained by matching the input retrieved information with a main data attribute text; the main data attribute text is a data link established according to the history change record of the main data attribute information; and inputting the target information as retrieval information into a data warehouse for retrieval to obtain a retrieval result. The data link established according to the historical change record of the attribute information of the main data associates the data collected to the data service platform with the main data, so that a plurality of version data in the life cycle of the main data can be effectively utilized, the target information is obtained according to the data link, and the target information is used as retrieval information for retrieval, so that the retrieval is comprehensive and efficient.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for a data warehouse to comprehensively retrieve information according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a data link provided by embodiments of the present invention;

fig. 3 is a schematic flowchart of a method for obtaining target information according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an application scenario of the method for comprehensively retrieving information from a data warehouse according to the embodiment of the present invention;

fig. 5 is a schematic structural diagram illustrating a component structure of an apparatus for comprehensively retrieving information from a data warehouse according to an embodiment of the present invention;

fig. 6 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flowchart of a method for a data warehouse to comprehensively retrieve information according to an embodiment of the present invention. As shown in fig. 1, the method for comprehensively retrieving information by the data warehouse comprises the following steps:

s101, matching input retrieval information with a main data attribute text to obtain target information matched with the retrieval information; the main data attribute text is a data link established according to the history change record of the main data attribute information.

In step S101, the master data refers to data to be shared among the systems throughout the enterprise. For example, in a railway system, the main data may be a station name, a route name, a train type number, or the like. The embodiment of the present invention does not limit the type of the main data.

Each data link in the main data attribute text corresponds to one main data, and each node in the data link comprises an attribute value of the main data, a starting time of the attribute value and an expiration time of the attribute value. The cutoff time of the attribute value of the main data currently in use is marked as 0. Fig. 2 is a schematic diagram of a data link according to an embodiment of the present invention. As shown in fig. 2, the main data of the data link has 3 different attribute values, and each attribute value has a corresponding start time and an end time. In the embodiment of the invention, the number of attribute values of the main data corresponding to the data link can be determined according to the actual history change record.

And matching the retrieval information with the attribute value of the main data in each data link of the main data attribute text to obtain a matched data link. In the data link, the attribute value of the main data matched with the retrieval information is the target information.

And S102, inputting the target information as search information into a data warehouse for searching to obtain a search result.

In step S102, the data warehouse is a strategic set that provides all types of data support for all levels of decision making processes of the enterprise. The data warehouse stores individual data and aims to provide business guidance, process improvement, monitoring time, cost, quality and control for enterprises requiring business intelligence.

According to the method for comprehensively retrieving the information of the data warehouse, provided by the embodiment of the invention, the input retrieval information is matched with the main data attribute text to obtain the target information matched with the retrieval information; the main data attribute text is a data link established according to the history change record of the main data attribute information; and inputting the target information as retrieval information into a data warehouse for retrieval to obtain a retrieval result. The data link established according to the historical change record of the attribute information of the main data associates the data collected to the data service platform with the main data, so that a plurality of version data in the life cycle of the main data can be effectively utilized, the target information is obtained according to the data link, and the target information is used as retrieval information for retrieval, so that the retrieval is comprehensive and efficient.

Fig. 3 is a flowchart illustrating a method for obtaining target information according to an embodiment of the present invention. As shown in fig. 3, in this embodiment, the search information includes both the searched keyword and the time range, and the method for acquiring the target information includes:

s301, matching the keywords in the retrieval information with the attribute values of the data links in the main data attribute text to obtain the data links matched with the keywords.

In step S301, the method for matching the keywords and the attribute values may be word vector similarity matching based on Natural Language Processing (NLP), and the method matches the search information with the main data attribute text to obtain main data attribute information matched with the search information as the target information; alternatively, the retrieval information may be matched with the main data attribute text based on an elastic search (elastic search), and the main data attribute information matched with the retrieval information may be obtained as the target information. The matching method is not limited in the embodiment of the invention. In general, word vector similarity based matching is used to match shorter texts, such as words, phrases, etc., and elastic search based matching is used to match longer texts, such as sentences, etc. The length of the text and the use of the matching method have no absolute limit, for example, word vector similarity matching and main data attribute text matching can be firstly utilized, and if a required result cannot be obtained, matching and the main data attribute text can be continuously conducted through elastic search.

S302, matching the time range in the retrieval information with the determined start time and/or deadline of the data link, and determining an attribute value matched with the time range as target information.

In step 302, the time range is matched with the starting time and/or the deadline, and the attribute value of the determined data link with the deadline later than the time range may be used as the target information, for example, the time range is after a certain time point; or, it may also be that the attribute value with the start time earlier than the time range in the determined data link is used as the target information, for example, the time range is before a certain time point; or the attribute value with the deadline earlier than the starting time of the time range and later than the time range in the determined data link can be used as the target information, for example, the time range is within a certain time period.

In some optional examples, matching the input search information with the main data attribute text to obtain target information matched with the search information, may further include: judging whether the retrieval information is matched with the main data attribute text to obtain main data attribute information matched with the retrieval information; if the main data attribute information matched with the retrieval information is not obtained, the retrieval information can be matched with a pre-established synonym table to obtain the synonym matched with the retrieval information, and the synonym is used as the target information.

And judging whether the retrieval information is matched with the main data attribute text or not, and determining by comparing whether the retrieval information and the main data attribute text have the same characters or not. For example, the search information is the Jizhou station, and the main data attribute text is Jizhou, Jizhou south, and Jicounty south. Since both have the same text, it can be determined that the retrieval system information and the main data attribute text match. The embodiment of the present invention does not limit the method for determining whether the search information matches the main data attribute text.

When the search information is not text information but code information composed of a string of numbers, the search information and the main data attribute text cannot be matched. In this case, the search information is matched with a synonym table established in advance. The synonym table may be composed of information including text information, code information, and other related information. Taking the synonym table of the station names as an example, the synonym table of the station names may include chinese text information, english text information, code information of the station names, longitude and latitude information of the station names, information of the places where the station names belong, and the like. And matching the code information in the retrieval information with the synonym table, and if the code information belongs to the code information of the station name, matching the corresponding code information in the synonym table of the station name so as to obtain the text information of the corresponding station name as target information.

In other optional examples, matching the input retrieval information with the main data attribute text to obtain target information matched with the retrieval information, matching with the main data attribute text by using word vector similarity matching, if a required result cannot be obtained, continuing to match with the main data attribute text by elastic search, and if the required result cannot be obtained, continuing to match with the retrieval information by using a synonym table. That is, the search information may be matched in a manner of traversing all matching methods according to a preset sequence until the target information meeting the requirement is obtained.

Fig. 4 is a schematic diagram of an application scenario of the method for comprehensively retrieving information by a data warehouse according to the embodiment of the present invention. As shown in fig. 4, a state station is input as a keyword in a retrieval system for retrieval, the retrieval system invokes a data link established according to the history change record of the main data attribute information, and performs word vector similarity matching between the keyword and the data link to obtain a matched station name data link, where the information of the data link includes: ji county south was changed from 31 days 7 to 1 day 2017 to Ji Zhou south, and Ji Zhou south was changed from 1 day 12 to 1 day 2018, and is used so far. The retrieval time range is that related data after 2017 and 1 month are obtained, and according to a matching principle that the corresponding time of the attribute value of the main data is later than the time range, south Ji county, south Ji State and Ji State are used as target information. And respectively inputting the target information Ji county south, Ji Zhou south and Ji Zhou to a data warehouse for retrieval, and taking the combined retrieval result as a final retrieval result.

Fig. 5 is a schematic diagram of an apparatus for comprehensively retrieving information from a data warehouse according to an embodiment of the present invention. As shown in fig. 5, the apparatus for comprehensively retrieving information by the data warehouse includes:

the matching module is used for matching the input retrieval information with the main data attribute text to obtain target information matched with the retrieval information; the main data attribute text is a data link established according to the history change record of the main data attribute information; each data link in the main data attribute text corresponds to one main data, and each node in the data link comprises an attribute value of the main data, the starting time of the attribute value and the deadline of the attribute value.

And the retrieval module is used for inputting the target information as retrieval information into the data warehouse for retrieval to obtain a retrieval result.

Optionally, the matching module comprises:

the first matching unit is used for matching the keywords in the retrieval information with the attribute values of the data links in the main data attribute text to obtain the data links matched with the keywords;

and the second matching unit is used for matching the time range in the retrieval information with the determined start time and/or the determined deadline of the data link, and determining the attribute value matched with the time range as the target information.

Optionally, the matching module further includes:

the third matching unit is used for matching the retrieval information with the main data attribute text based on word vector similarity matching to obtain main data attribute information matched with the retrieval information as the target information; alternatively, the first and second electrodes may be,

the fourth matching unit is used for matching the retrieval information with the main data attribute text based on the elastic search to obtain main data attribute information matched with the retrieval information as target information;

the judging unit is used for judging whether the retrieval information is matched with the main data attribute text or not to obtain the main data attribute information matched with the retrieval information;

and the fifth matching unit is used for matching the retrieval information with a pre-established synonym table to obtain a synonym matched with the retrieval information as target information if the main data attribute information matched with the retrieval information is not obtained.

Fig. 6 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 6, the electronic device may include: a processor (processor)601, a communication Interface (Communications Interface)602, a memory (memory)603 and a communication bus 604, wherein the processor 601, the communication Interface 602 and the memory 603 complete communication with each other through the communication bus 604. The processor 601 may invoke logic instructions in the memory 603 to perform a method of a data warehouse to retrieve information comprehensively, the method comprising:

matching input retrieval information with a main data attribute text to obtain target information matched with the retrieval information; the main data attribute text is a data link established according to the history change record of the main data attribute information; and inputting the target information as retrieval information into a data warehouse for retrieval to obtain a retrieval result.

In addition, the logic instructions in the memory 603 may be implemented in the form of software functional modules and stored in a computer readable storage medium when the software functional modules are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform a method for a data repository to retrieve information comprehensively, the method comprising:

matching input retrieval information with a main data attribute text to obtain target information matched with the retrieval information; the main data attribute text is a data link established according to the history change record of the main data attribute information; and inputting the target information as retrieval information into a data warehouse for retrieval to obtain a retrieval result.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for comprehensively retrieving information from a data warehouse provided by the above methods, the method comprising:

matching input retrieval information with a main data attribute text to obtain target information matched with the retrieval information; the main data attribute text is a data link established according to the history change record of the main data attribute information; and inputting the target information as retrieval information into a data warehouse for retrieval to obtain a retrieval result.

The above-described embodiments of the apparatus are merely illustrative, and the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于意图识别的方法以及相应的系统、计算机设备和介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!