Method and system for integrating multi-source heterogeneous data based on uniform access distribution

文档序号:1831336 发布日期:2021-11-12 浏览:14次 中文

阅读说明:本技术 一种基于统一访问分布式集成多源异构数据的方法及系统 (Method and system for integrating multi-source heterogeneous data based on uniform access distribution ) 是由 岑维聪 吕广宪 李运硕 冯德志 陆一鸣 刘鹏 王国庆 于 2020-11-23 设计创作,主要内容包括:本申请公开了一种基于统一访问分布式集成多源异构数据的方法及系统。其中,该方法,将多源异构数据按照数据类型分别存储在不同数据库中,数据类型包括结构化数据、半结构化数据以及非结构化数据;在数据应用模块向统一访问服务模块发送数据操作请求后,基于预先建立的数据融合访问模型,利用数据网关组件对数据操作请求进行加密;利用语义解析运行引擎组件对数据操作请求进行解析,并且利用数据缓存组件将对数据操作请求进行存储;利用数据路由组件调度与数据操作请求对应的多源异构数据;利用数据管理组件,基于统一访问分布式将与数据操作请求对应的多源异构数据进行集成,得到集成结果。(The application discloses a method and a system for integrating multi-source heterogeneous data based on uniform access distribution. The method comprises the steps of respectively storing multi-source heterogeneous data in different databases according to data types, wherein the data types comprise structured data, semi-structured data and unstructured data; after the data application module sends a data operation request to the unified access service module, the data operation request is encrypted by using a data gateway component based on a pre-established data fusion access model; analyzing the data operation request by utilizing a semantic analysis running engine component, and storing the data operation request by utilizing a data cache component; scheduling multi-source heterogeneous data corresponding to the data operation request by using the data routing component; and integrating the multi-source heterogeneous data corresponding to the data operation request based on the uniform access distribution by using the data management component to obtain an integration result.)

1. A method for distributed integration of multi-source heterogeneous data based on unified access is characterized by comprising the following steps:

respectively storing multi-source heterogeneous data in different databases according to data types, wherein the data types comprise structured data, semi-structured data and unstructured data;

after a data application module sends a data operation request to a unified access service module, encrypting the data operation request by using a data gateway component based on a pre-established data fusion access model;

analyzing the data operation request by utilizing a semantic analysis running engine component, and storing the data operation request by utilizing a data cache component;

scheduling the multi-source heterogeneous data corresponding to the data operation request by using a data routing component;

and integrating the multi-source heterogeneous data corresponding to the data operation request based on a uniform access distribution type by using a data management component to obtain an integration result.

2. The method of claim 1, wherein storing the multi-source heterogeneous data in different databases according to data types respectively comprises:

storing the primary data and the metadata in a relational database;

loading the multi-source heterogeneous data in batch and compressing the multi-source heterogeneous data through the distributed file system; and

and realizing the online transaction type second-level retrieval query and online analysis processing type high-speed data analysis of the multi-source heterogeneous data through the distributed database.

3. The method of claim 1, further comprising:

and returning the integration result to the data storage component, and sending the integration result to the data application module to realize the service requirement.

4. A system for integrating multi-source heterogeneous data based on uniform access distribution is characterized by comprising a data storage module, a uniform access service module and a data application module,

the data storage module is used for respectively storing the multi-source heterogeneous data in different databases according to data types, wherein the data types comprise structured data, semi-structured data and unstructured data;

the unified access service module is used for encrypting, analyzing and storing the data operation request based on a pre-established data fusion access model after the data application module sends the data operation request to the unified access service module, and integrating the multi-source heterogeneous data based on a unified access distribution type to obtain an integration result; and

the data application module is used for calling the unified access service module to realize service requirements.

5. The system of claim 4,

the unified access service module comprises a gateway component, a semantic analysis operation engine component, a data cache component, a data routing component and a data management component.

6. The system of claim 5,

the data gateway component is configured to encrypt the data operation request.

7. The system of claim 5,

and the semantic analysis running engine component analyzes the data operation request.

8. The system of claim 5,

the data caching component stores the data operation request.

9. The system of claim 5,

and scheduling the multi-source heterogeneous data corresponding to the data operation request by using a data routing component.

10. The system of claim 5,

and integrating the multi-source heterogeneous data corresponding to the data operation request based on a uniform access distribution type by using a data management component to obtain an integration result.

Technical Field

The application relates to the technical field of big data, in particular to a method and a system for integrating multi-source heterogeneous data based on uniform access distribution.

Background

The arrival of the big data era of the power grid means that the data of power grid enterprises are different from the traditional era with single data type and slow growth, and the data volume of geometric growth and more complex data sources are brought by an intelligent substation system, equipment and asset management, a field mobile maintenance system, a GIS system, an intelligent meter and the like, so that the huge diversified data can be rapidly applied to play a greater value, and big data concepts and technologies need to be applied.

The mass data processing platform of the national network company is mainly applied to mass data storage management, simple query and application and the like, and key implementation of the mass data processing platform only completes the technical characteristics of large volume in the technical concept of big data, does not deeply relate to diversity, value mining, rapid processing and the like, and does not relate to a distributed processing technology at all.

The data specification forms of the full-service unified data center are different, and the full-service unified data center not only has structured data in a relational database, but also has semi-structured data and unstructured data stored in documents such as XML (extensive makeup language), Excel and the like, so that the current situation of complex data makes the sharing of data among systems become a great problem. Therefore, a uniform access service is needed to efficiently solve the cross-system data sharing and solve a big problem faced by enterprises in resource integration, sharing and service processes.

The technical problem that the data sharing among systems is a big problem due to the complex data current situation of the data in the prior art is solved, wherein the data specification forms of the full-service unified data center are different, and the data in the prior art not only have structured data in a relational database, but also have semi-structured data and unstructured data stored in files such as XML and Excel.

Disclosure of Invention

The embodiment of the disclosure provides a method and a system for integrating multi-source heterogeneous data based on unified access distribution, so as to at least solve the technical problems that in the prior art, the data specification forms of a full-service unified data center are different, and the data have not only structured data in a relational database, but also semi-structured data and unstructured data stored in files such as XML (extensive Makeup language), Excel and the like, and the sharing of data among systems is a big problem due to the complex data current situation.

According to an aspect of the embodiments of the present disclosure, there is provided a method for integrating multi-source heterogeneous data based on a uniform access distributed system, including: respectively storing the multi-source heterogeneous data in different databases according to data types, wherein the data types comprise structured data, semi-structured data and unstructured data; after the data application module sends a data operation request to the unified access service module, the data operation request is encrypted by using a data gateway component based on a pre-established data fusion access model; analyzing the data operation request by utilizing a semantic analysis running engine component, and storing the data operation request by utilizing a data cache component; scheduling multi-source heterogeneous data corresponding to the data operation request by using the data routing component; and integrating the multi-source heterogeneous data corresponding to the data operation request based on the uniform access distribution by using the data management component to obtain an integration result.

According to another aspect of the embodiments of the present disclosure, a system for integrating multi-source heterogeneous data based on unified access distribution is further provided, where the system includes a data storage module, a unified access service module, and a data application module, the data storage module is configured to store the multi-source heterogeneous data in different databases according to data types, and the data types include structured data, semi-structured data, and unstructured data; the unified access service module is used for encrypting, analyzing and storing the data operation request based on a pre-established data fusion access model after the data application module sends the data operation request to the unified access service module, and integrating multi-source heterogeneous data based on a unified access distribution type to obtain an integration result; and the data application module is used for calling the unified access service module to realize the service requirement.

In the invention, mass data can be flexibly and efficiently processed through the uniform access service, the distributed heterogeneous database can be transparently accessed by using the uniform access interface, the complexity and the heterogeneity of a bottom layer distribution environment are abstracted by the middleware, the application layer and the data source layer are decoupled, the development difficulty of a communication information system is reduced, the maintainability and the expandability of the system are improved, and the data access system capable of processing the mass data and transparently accessing the heterogeneous database is realized. The technical problem that the data sharing among systems is difficult due to the complex data current situation that the data specification forms of the full-service unified data center are different in the prior art, namely structured data in a relational database, semi-structured data and unstructured data stored in documents such as XML and Excel exist.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure. In the drawings:

fig. 1 is a schematic flowchart of a method for integrating multi-source heterogeneous data based on uniform access distributed according to a first aspect of embodiment 1 of the present disclosure;

FIG. 2 is a schematic diagram of a data storage structure according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a unified access services framework according to an embodiment of the present disclosure; and

fig. 4 is a schematic diagram of a system for integrating multi-source heterogeneous data based on a uniform access distribution according to an embodiment of the present disclosure.

Detailed Description

The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for full and complete disclosure of the invention and to fully convey the scope of the invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.

Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In addition, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.

According to a first aspect of the present embodiment, a method 100 for integrating multi-source heterogeneous data based on uniform access distributed is provided. Fig. 1 shows a schematic flow diagram of the method, which, with reference to fig. 1, comprises:

s102: respectively storing the multi-source heterogeneous data in different databases according to data types, wherein the data types comprise structured data, semi-structured data and unstructured data;

s104: after the data application module sends a data operation request to the unified access service module, the data operation request is encrypted by using a data gateway component based on a pre-established data fusion access model;

s106: analyzing the data operation request by utilizing a semantic analysis running engine component, and storing the data operation request by utilizing a data cache component;

s108: scheduling multi-source heterogeneous data corresponding to the data operation request by using the data routing component;

s110: and integrating the multi-source heterogeneous data corresponding to the data operation request based on the uniform access distribution by using the data management component to obtain an integration result.

Specifically, the present invention is divided into two aspects: data storage and data unified access service.

The data storage is mainly oriented to storage and query of full-type data (structured, semi-structured, real-time and unstructured), and is characterized by mass scale storage and rapid query reading. On the basis of low-cost hardware (X86) and a magnetic disk, an industry typical functional system comprising a distributed file system, a distributed relational database, a NoSQL database, a real-time database, a memory database and the like is adopted to support high-level data processing application.

(1) Relational data store

The relational database is mainly positioned as a storage of metadata and main data on one hand, and a bottom database of partial management and operation and maintenance applications on the other hand, and exchanges and jointly queries with original service system data. The relational database is used as a supplement and reinforcement of the distributed file system and the distributed database, and can meet the storage requirements of various data.

(2) Distributed file system

The distributed file system is a distributed file system cluster established on low-cost X86 hardware, a master-slave structure is adopted, a master node is responsible for metadata management of the distributed file system and providing a uniform name space, and a large number of data nodes are responsible for data IO processing and calculation.

A unified bottom-layer distributed file system is adopted in a national network big data platform, all data are gathered and stored on the file system, an erasure code function and file encryption storage are supported, meanwhile, a data file is divided into one or more data blocks and is stored on different data nodes in a scattered mode, and the data blocks have multiple redundancies so as to solve the problem of data loss caused by hardware faults.

(3) Distributed database

The distributed database storage solves the theoretical and implementation limitations of the relational database in processing mass data, and realizes OLTP class-two-level retrieval query and OLAP class high-speed data analysis application requirements of the mass data. Usually, the real-time distributed database is composed of a management server and a plurality of data servers.

The unified access service is based on a data fusion access model, provides functions of data routing, data gateways and the like, forms a data service component, realizes standard SQL data operation, safety limit control and data caching of a distributed file system, a distributed data warehouse, a non-relational database and a relational database, and supports business application to uniformly access various data resources.

The data application sends a data operation request to the unified access service, the data gateway encrypts and issues the request, the semantic analysis running engine analyzes the request, the data cache is accessed preferentially after the request is analyzed, the service schedules data among all data nodes through a data route, and an execution result is returned to the data cache. After obtaining the operation result, the service returns the result to the data application through the data gateway.

(1) Defining a unified data access interface

And a uniform access interface is defined, so that the application system can interact with the database through the interface of the remote access middleware, and the integration and maintenance of the system are facilitated.

(2) Masking underlying database differences and providing transparent access functionality

The service shields the mechanism of the difference of the bottom database, and the application system can transparently access the databases distributed on different network nodes, just like accessing the local database.

(3) Fast query of mass data

The query request of the user is decomposed into a plurality of sub-queries, distributed parallel query is realized, system resources are efficiently utilized, and the query request of the user can be quickly responded.

(4) To data set transmission

The uniform access service supports the function of network transmission of data result sets, and the application system can perform data set operation locally.

(5) Providing good transaction security

The unified access service ensures that the data consistency and integrity of the database are maintained when multiple users access the database.

Therefore, according to the method 100 for integrating the multi-source heterogeneous data based on the uniform access distribution type, mass data can be flexibly and efficiently processed through the uniform access service, the distributed heterogeneous database can be transparently accessed by using the uniform access interface, the complexity and the heterogeneity of the bottom layer distribution environment are abstracted by the middleware, the application layer and the data source layer are decoupled, the development difficulty of a communication information system is reduced, the maintainability and the expandability of the system are improved, and the data access system capable of processing the mass data and transparently accessing the heterogeneous database is realized. The technical problem that the data sharing among systems is difficult due to the complex data current situation that the data specification forms of the full-service unified data center are different in the prior art, namely structured data in a relational database, semi-structured data and unstructured data stored in documents such as XML and Excel exist.

Optionally, the respectively storing the multi-source heterogeneous data in different databases according to data types includes: storing the primary data and the metadata in a relational database; loading multi-source heterogeneous data in batches and compressing the multi-source heterogeneous data through a distributed file system; and realizing online transaction type second-level retrieval query and online analysis processing type high-speed data analysis of multi-source heterogeneous data through a distributed database.

Specifically, referring to FIG. 2, the primary data is stored in a relational database along with the metadata. The multi-source heterogeneous data is loaded in batch through the distributed file system, the multi-source heterogeneous data is compressed, and file storage, multi-copy and fault tolerance are achieved. The method realizes the on-line transaction processing type second-level retrieval query and on-line analysis processing type high-speed data analysis of multi-source heterogeneous data through a distributed database, for example: columnar storage, memory storage, fast read and write, and linear expansion. The method divides data into three storage modes: relational databases, distributed file systems, distributed databases. Different types of storage are responsible for different responsibilities.

Optionally, the method further comprises: and returning the integration result to the data storage component, and sending the integration result to the data application module to realize the service requirement.

Therefore, mass data can be flexibly and efficiently processed through the unified access service, the distributed heterogeneous database can be transparently accessed by using the unified access interface, the complexity and the heterogeneity of a bottom layer distribution environment are abstracted by the middleware, the application layer and the data source layer are decoupled, the development difficulty of a communication information system is reduced, the maintainability and the expandability of the system are improved, and the data access system capable of processing the mass data and transparently accessing the heterogeneous database is realized. The technical problem that the data sharing among systems is difficult due to the complex data current situation that the data specification forms of the full-service unified data center are different in the prior art, namely structured data in a relational database, semi-structured data and unstructured data stored in files such as XML and Excel exist.

According to a first aspect of the present embodiment, a system 400 for integrating multi-source, heterogeneous data based on uniform access to distributed data is provided. Referring to fig. 4, the system 400 includes a data storage module 410, a unified access service module 420, and a data application module 430, where the data storage module 410 is configured to store multi-source heterogeneous data in different databases according to data types, where the data types include structured data, semi-structured data, and unstructured data; the unified access service module 420 is configured to encrypt, analyze, and store the data operation request based on a pre-established data fusion access model after the data application module sends the data operation request to the unified access service module, and integrate the multi-source heterogeneous data based on a unified access distribution type to obtain an integration result; and the data application module 430 is used for calling the unified access service module to realize the business requirement.

Referring to fig. 3, fig. 3 is a unified access service framework. A unified access service module is added between the data application module and the data storage module, and the module comprises a data routing component, a semantic analysis operation engine component, a data cache component, a data management component and a data gateway component. The data application module sends a data operation request to the unified access service module, the data gateway component encrypts and issues the request, the semantic analysis running engine component analyzes the request, the data cache component is accessed preferentially after analysis, the service dispatches data among all data nodes through the data routing component, and an execution result is returned to the data cache component. And after the operation result is obtained, the unified access service module continuously returns the result to the data application module through the data gateway component.

Optionally, the unified access service module 420 includes a gateway component, a semantic parsing execution engine component, a data caching component, a data routing component, and a data management component.

Optionally, the data gateway component is configured to encrypt the data operation request.

Optionally, the semantic parsing run engine component parses the data operation request.

Optionally, the data caching component will store the data operation request.

Optionally, the data routing component is used for scheduling multi-source heterogeneous data corresponding to the data operation request.

Optionally, the data management component is used for integrating the multi-source heterogeneous data corresponding to the data operation request based on the uniform access distribution type to obtain an integration result.

Therefore, according to the system 400 based on the uniform access distributed integrated multi-source heterogeneous data, mass data can be flexibly and efficiently processed through the uniform access service, the uniform access interface is used for transparently accessing the distributed heterogeneous database, the complexity and the heterogeneity of the bottom layer distribution environment are abstracted by the middleware, the application layer and the data source layer are decoupled, the development difficulty of a communication information system is reduced, the maintainability and the expandability of the system are improved, and the data access system capable of processing the mass data and transparently accessing the heterogeneous database is realized. The technical problem that the sharing of data among systems is difficult due to the complex data current situation that the data specification forms of the full-service unified data center are different in the prior art, namely structured data in a relational database, semi-structured data and unstructured data stored in files such as XML and Excel exist.

The system 400 based on uniform access distributed integrated multi-source heterogeneous data according to the embodiment of the present invention corresponds to the system method 100 based on uniform access distributed integrated multi-source heterogeneous data according to another embodiment of the present invention, and details thereof are not repeated herein.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be implemented by adopting various computer languages, such as object-oriented programming language Java and transliterated scripting language JavaScript.

The present application has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the scope of the present application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

11页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于GIS的多能协同展示方法和装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!