Data processing method and device

文档序号:1889378 发布日期:2021-11-26 浏览:4次 中文

阅读说明:本技术 一种数据处理方法及装置 (Data processing method and device ) 是由 李长亮 樊骏锋 汪美玲 于 2020-05-21 设计创作,主要内容包括:本发明实施例提供了一种数据处理方法及装置,上述方法包括:获得待处理的结构化数据;以结构化数据的标识为节点标识、且以结构化数据包含的字段值为节点属性,生成每一结构化数据所对应节点的节点数据;根据预设的关系字段信息,生成各个结构化数据所对应节点间的关系数据,其中,所述预设的关系字段信息包括:结构化数据中存在关系的字段值以及存在关系的字段值间的关系信息;获得包含节点数据以及关系数据的文件,并将文件导入至图数据库中。应用本发明实施例提供的方案进行数据处理时,提高了数据处理的效率。(The embodiment of the invention provides a data processing method and a device, wherein the method comprises the following steps: obtaining structured data to be processed; generating node data of a node corresponding to each structured data by taking the identifier of the structured data as a node identifier and taking a field value contained in the structured data as a node attribute; generating relationship data between nodes corresponding to each structured data according to preset relationship field information, wherein the preset relationship field information comprises: the method comprises the steps of structuring data, wherein the field values of relations exist in the data and relation information among the field values of relations exists; files containing node data and relationship data are obtained and imported into a graph database. When the scheme provided by the embodiment of the invention is applied to data processing, the efficiency of data processing is improved.)

1. A method of data processing, the method comprising:

obtaining structured data to be processed;

generating node data of a node corresponding to each structured data by taking the identifier of the structured data as a node identifier and taking a field value contained in the structured data as a node attribute;

generating relationship data between nodes corresponding to each structured data according to preset relationship field information, wherein the preset relationship field information comprises: the field values of the existing relations in the structured data and the relation information among the field values of the existing relations;

and acquiring a file containing the node data and the relationship data, and importing the file into a graph database.

2. The method according to claim 1, wherein the generating relationship data between nodes corresponding to each structured data according to the preset relationship field information includes:

determining a node of the node attribute, wherein the node includes the field value in preset relationship field information;

and generating the relation data between the determined nodes according to the relation information in the preset relation field information.

3. The method of claim 1, wherein obtaining the structured data to be processed comprises:

acquiring structured data from a preset data source, and performing data cleaning on the acquired structured data;

and mapping field identifiers corresponding to field values in the cleaned structured data into attribute identifiers corresponding to node attributes based on a preset identifier mapping relation, and taking the structured data after mapping the identifiers as the structured data to be processed.

4. The method according to claim 3, wherein the using the structured data identified by the mapping as the structured data to be processed comprises:

and according to a preset field value format, carrying out format adjustment on the field value in the structured data after mapping identification, and taking the adjusted structured data as the structured data to be processed.

5. The method according to any of claims 1-4, wherein the obtaining a file containing the node data and the relationship data comprises:

and generating a file containing the node data and the relationship data according to a preset file format requirement, and storing the file according to a preset file storage requirement.

6. A data processing apparatus, characterized in that the apparatus comprises:

the structured data acquisition module is used for acquiring structured data to be processed;

the node data generation module is used for generating node data of a node corresponding to each piece of structured data by taking the identifier of the piece of structured data as a node identifier and taking a field value contained in the piece of structured data as a node attribute;

a relationship data generating module, configured to generate relationship data between nodes corresponding to each piece of structured data according to preset relationship field information, where the preset relationship field information includes: the field values of the existing relations in the structured data and the relation information among the field values of the existing relations;

and the data import module is used for obtaining a file containing the node data and the relationship data and importing the file into a graph database.

7. The apparatus of claim 6, wherein the relational data generation module comprises:

the node determining submodule is used for determining a node of the field value in the node attribute, wherein the node comprises preset relationship field information;

and the relation data generation submodule is used for generating the relation data between the determined nodes according to the relation information in the preset relation field information.

8. The apparatus of claim 6, wherein the structured data obtaining module comprises:

the data cleaning submodule is used for acquiring structured data from a preset data source and cleaning the acquired structured data;

and the identifier mapping submodule is used for mapping the field identifier corresponding to the field value in the cleaned structured data into the attribute identifier corresponding to the node attribute based on the preset identifier mapping relation, and taking the structured data after mapping the identifier as the structured data to be processed.

9. The apparatus of claim 8,

the identifier mapping sub-module is specifically configured to map, based on a preset identifier mapping relationship, a field identifier corresponding to a field value in the cleaned structured data to an attribute identifier corresponding to a node attribute, perform format adjustment on the field value in the structured data after the identifier is mapped according to a preset field value format, and use the adjusted structured data as the structured data to be processed.

10. The apparatus according to any one of claims 6-9,

the data processing module is specifically configured to generate a file including the node data and the relationship data according to a preset file format requirement, store the file according to a preset file storage requirement, and import the file into a graph database.

11. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1 to 5 when executing a program stored in the memory.

12. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-5.

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data processing method and apparatus.

Background

A graph database is a database that stores relational data. Relational data stored in a graph database includes: node data for each node and relationship data representing relationships between each node. Because the graph database stores the relationship data of the relationship among the nodes, the efficiency of inquiring the node data of the nodes in the graph database is high. In addition, in practical application, query of structured data is often involved, so that the structured data can be imported into a graph database, and query of the structured data can be realized based on the graph database, so that query efficiency of the structured data is improved.

Because relational data is unstructured, it is difficult to import structured data directly into a graph database. In the prior art, when structured data is imported into a graph database, a worker manually generates a file of the structured data, and an electronic device imports the file into the graph database by calling an import interface of the graph database. However, because the file is manually generated by the staff, when the data volume of the structured data is huge, the workload of generating the file is increased, and the efficiency is low, so that the efficiency of importing the data is low, and the data processing efficiency is reduced.

Disclosure of Invention

Embodiments of the present invention provide a data processing method and apparatus, so as to improve data processing efficiency. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a data processing method, where the method includes:

obtaining structured data to be processed;

generating node data of a node corresponding to each structured data by taking the identifier of the structured data as a node identifier and taking a field value contained in the structured data as a node attribute;

generating relationship data between nodes corresponding to each structured data according to preset relationship field information, wherein the preset relationship field information comprises: the field values of the existing relations in the structured data and the relation information among the field values of the existing relations;

and acquiring a file containing the node data and the relationship data, and importing the file into a graph database.

In an embodiment of the present invention, the generating relationship data between nodes corresponding to each piece of structured data according to the preset relationship field information includes:

determining a node of the node attribute, wherein the node includes the field value in preset relationship field information;

and generating the relation data between the determined nodes according to the relation information in the preset relation field information.

In an embodiment of the present invention, the obtaining of the structured data to be processed includes:

acquiring structured data from a preset data source, and performing data cleaning on the acquired structured data;

and mapping field identifiers corresponding to field values in the cleaned structured data into attribute identifiers corresponding to node attributes based on a preset identifier mapping relation, and taking the structured data after mapping the identifiers as the structured data to be processed.

In an embodiment of the present invention, the taking the structured data after the mapping identifier as the structured data to be processed includes:

and according to a preset field value format, carrying out format adjustment on the field value in the structured data after mapping identification, and taking the adjusted structured data as the structured data to be processed.

In an embodiment of the present invention, the obtaining a file containing the node data and the relationship data includes:

and generating a file containing the node data and the relationship data according to a preset file format requirement, and storing the file according to a preset file storage requirement.

In a second aspect, an embodiment of the present invention provides a data processing apparatus, where the apparatus includes:

the structured data acquisition module is used for acquiring structured data to be processed;

the node data generation module is used for generating node data of a node corresponding to each piece of structured data by taking the identifier of the piece of structured data as a node identifier and taking a field value contained in the piece of structured data as a node attribute;

a relationship data generating module, configured to generate relationship data between nodes corresponding to each piece of structured data according to preset relationship field information, where the preset relationship field information includes: the field values of the existing relations in the structured data and the relation information among the field values of the existing relations;

and the data import module is used for obtaining a file containing the node data and the relationship data and importing the file into a graph database.

In an embodiment of the present invention, the relationship data generating module includes:

the node determining submodule is used for determining a node of the field value in the node attribute, wherein the node comprises preset relationship field information;

and the relation data generation submodule is used for generating the relation data between the determined nodes according to the relation information in the preset relation field information.

In an embodiment of the present invention, the structured data obtaining module includes:

the data cleaning submodule is used for acquiring structured data from a preset data source and cleaning the acquired structured data;

and the identifier mapping submodule is used for mapping the field identifier corresponding to the field value in the cleaned structured data into the attribute identifier corresponding to the node attribute based on the preset identifier mapping relation, and taking the structured data after mapping the identifier as the structured data to be processed.

In an embodiment of the present invention, the identifier mapping sub-module is specifically configured to map, based on a preset identifier mapping relationship, a field identifier corresponding to a field value in the cleaned structured data to be an attribute identifier corresponding to a node attribute, perform format adjustment on the field value in the structured data after the mapping is performed according to a preset field value format, and use the adjusted structured data as the structured data to be processed.

In an embodiment of the present invention, the data processing module is specifically configured to generate a file including the node data and the relationship data according to a preset file format requirement, store the file according to a preset file storage requirement, and import the file into a graph database.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

a processor configured to implement the method steps of the first aspect when executing the program stored in the memory.

In a fourth aspect, the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method steps described in the first aspect.

As can be seen from the above, when the scheme provided by the embodiment of the present invention is applied to data processing, the identifier of the structured data to be processed is taken as the node identifier, and the field value included in the structured data is taken as the node attribute, so as to generate node data of a node corresponding to each structured data; and generating relationship data between nodes corresponding to each structured data according to the relationship field values in the preset relationship field information and the relationship information among the relationship field values, and importing the file containing the generated node data and the relationship data into the graph database. Compared with the prior art, the method has the advantages that the files do not need to be manually generated by workers, the workload of generating the files is reduced, the efficiency of data import can be improved, and the efficiency of data processing is improved.

In addition, since the relational data stored in the graph data includes node data of each node, the node data of the node includes: node identification and node attributes. Therefore, the node data of the node corresponding to each piece of structured data can be generated more accurately by taking the identifier of the structured data as the node identifier and the field value contained in the structured data as the node attribute. Further, because the relational data stored in the graph database further includes relational data representing relationships between the respective nodes, that is, the relational data is used to represent relationships between the nodes, and the preset relationship field information includes: the structured data comprises the field values with the relationship and the relationship information among the field values with the relationship, wherein the relationship information is used for representing the relationship among the field values with the relationship, so that the relationship data among the nodes corresponding to the structured data can be accurately generated according to the preset relationship field information. Therefore, the obtained file containing the generated node data and the relationship data is the relationship type data which can be stored in the graph data, the data importing efficiency is improved, and the data processing efficiency is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram illustrating a graph database according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a first data processing method according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of a second data processing method according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a third data processing method according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating a fourth data processing method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a first data processing apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a second data processing apparatus according to an embodiment of the present invention;

FIG. 8 is a block diagram of a third data processing apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

First, a concept related to the embodiment of the present invention is explained.

1. Structured data

Structured data is data that is represented by a unified data structure. For example, the structured data may be represented by a two-dimensional table structure.

Taking table 1 as an example, table 1 shows a two-dimensional table storing structured data.

TABLE 1

Name (I) Sex Age (age)
Zhang San For male 20
Li Si Woman 30

In table 1, "name, gender, age" is identified for each field. The field identification is used to distinguish field values.

Specifically, "zhang san and lie si" are field values whose fields are identified as names, "men and women" are field values whose fields are identified as genders, and "20 and 30" are field values whose fields are identified as ages.

As can be seen from table 1, the structured data of each row may represent information of one entity. Wherein a line of structured data may also be referred to as a piece of structured data.

2. Graph database

A graph database is a database that stores relational data. Common graph databases include: neo4j, janussgraph, etc.

Relational data stored in a graph database includes: node data for each node and relationship data representing relationships between each node.

The node data of the node may include: node identification and node attributes.

Specifically, the node identifier is used to distinguish different nodes.

The node attributes are attributes of nodes, different nodes have different node attributes, and a node has a node attribute related to an entity corresponding to the node. For example: assume that there are two nodes, where the entity corresponding to the first node is entity 1, the entity corresponding to the second node is entity 2, and the entity attribute of entity 1 includes: xiaoming, male, 18 years old, entity 2 entity attributes include: reddish, female, age 19, then the node attributes of the first node are: xiaoming, Man, 18 years old, the second node has the node attribute of; small red, female, 19 years old.

Each node attribute of one node corresponds to each attribute identifier, and node attributes of different nodes can correspond to the same attribute identifier.

For example: following the above example, assume that the attributes are identified as name, gender, age.

The node attribute of the first node is Xiaoming corresponding to the attribute identifier as name, the node attribute is male corresponding to the attribute identifier as gender, and the node attribute is 18 years old corresponding to the attribute identifier as age.

The node attribute of the second node is reddish corresponding to the attribute identifier as name, the node attribute is female corresponding to the attribute identifier as gender, and the node attribute is 19 years old corresponding to the attribute identifier as age.

The node attribute with the attribute identification as the name corresponding to the node attribute comprises the following steps: xiaoming and Xiaohong.

The node attribute with attribute identification as gender corresponding to the node attribute comprises: male and female.

The node attribute with the attribute identifier of age corresponding to the node attribute comprises: 18 years old and 19 years old.

The above-mentioned relationship data is used to represent the relationship between nodes.

Taking fig. 1 as an example, fig. 1 is a schematic diagram of a graph database according to an embodiment of the present invention. In fig. 1, each oval frame represents each node, the characters in the oval frame are node data of the node, the connecting line with an arrow between the oval frames represents that there is a relationship between the nodes corresponding to the oval frames on both sides of the connecting line, and the characters on the connecting line with an arrow between the oval frames represents relationship data of the relationship between the nodes corresponding to the oval frames on both sides of the connecting line.

Two nodes are included in fig. 1. The "node 1 and the china" are node data of the left node, specifically, the "node 1" is a node identifier of the left node, and the "china" is a node attribute of the left node.

Similarly, "node 2 and beijing" are node data of the right-side node, specifically, "node 2" is a node identifier of the right-side node, and "beijing" is a node attribute of the right-side node.

Because the connecting line with the arrow between the left node and the right node points from the right node to the left node, a relationship exists between the left node and the right node. And because the characters on the connecting line with the arrow are as follows: therefore, the relationship data between the left node and the right node is: the right side node belongs to the left side node.

Referring to fig. 2, fig. 2 is a schematic flow chart of a first data processing method according to an embodiment of the present invention, where the method includes S201 to S204.

S201: structured data to be processed is obtained.

Specifically, the structured data to be processed may be obtained from a preset structured database. The preset structured database can be a structured database preset by a worker or a user.

In an embodiment of the present invention, data processing parameter information preset by a user or a worker may be received, where the data processing parameter information includes: database links, names of data sources, data storage paths, etc. Specifically, the data source may be determined according to the database link, the structured data to be processed may be obtained from the data source, the data source may be determined according to the name of the data source, the structured data to be processed may be obtained from the data source, and the structured data to be processed may be obtained according to the data storage path.

S202: and generating node data of the node corresponding to each structured data by taking the identifier of the structured data as the node identifier and taking the field value contained in the structured data as the node attribute.

Since the relational data stored in the graph database includes: node data for each node and relationship data representing relationships between each node. Wherein the node data of the node includes: node identification and node attributes. Therefore, the node data of the node corresponding to each structured data can be generated by taking the identifier of the structured data as the node identifier and the field value contained in the structured data as the node attribute.

The node identifiers are used for distinguishing different nodes, and different nodes correspond to different node identifiers, that is, the nodes and the node identifiers are in one-to-one correspondence. Since the node identifier is determined based on the identifier of the structured data, in an embodiment of the present invention, the identifier of each piece of structured data may be determined in units of each piece of structured data.

For example: taking the above table 1 as an example, the identification of the structured data of the first row in table 1 may be determined as ID1, and the identification of the structured data of the second row may be determined as ID 2.

In this way, the determined identification of each piece of structured data can be made unique in each piece of structured data.

Specifically, when node data of a node corresponding to each piece of structured data is generated, the identifier of the structured data is used as the node identifier, and the field value included in the structured data is used as the node attribute.

For example: following table 1 above and the above example, the identification of the structured data of the first row is ID1, and therefore, the node of the node corresponding to the structured data of the first row is ID 1. The identification of the second row of structured data is ID2, and therefore, the node of the node corresponding to the second row of structured data is ID 2.

The values of the fields included in the structured data in the first row are "zhang san, man, and 20", and therefore the node attributes of the nodes corresponding to the structured data in the first row are "zhang san, man, and 20". The structured data of the second row contains field values of "lie four, girl, 30", and therefore, the node attribute of the node corresponding to the structured data of the second row is "lie four, girl, 30".

S203: and generating the relation data between the nodes corresponding to each structured data according to the preset relation field information.

The preset relationship field information includes: the field values of the existing relations in the structured data and the relation information among the field values of the existing relations.

The above-mentioned preset relationship field information may be expressed in the following manner.

The first mode is as follows: the preset relationship field information comprises: a start field value, a termination field value, and relationship information indicating a relationship between the start field value and the termination field value.

For example: the preset relationship field information comprises: the initial field value is Beijing, the termination field value is China, and the relationship information is; the relationship between Beijing and China is the family relationship.

The second mode is as follows: the preset relationship field information is as follows: a sentence including field values in which a relationship exists in the structured data and relationship information between the field values in which the relationship exists.

For example: assume that the preset relationship field information is: beijing is the first capital of China. Performing semantic recognition on the preset relationship field information, and obtaining the field value with relationship in the structured data as follows: chinese and Beijing, and the semantic analysis is carried out on the preset relationship field information, and the relationship between the Chinese and the Beijing can be determined as follows: beijing belongs to China, and is the first capital of China. Therefore, "Beijing" can be determined as the starting field value, and "China" as the ending field value. The relationship information between the field values having the relationship is: beijing belongs to China, and is the first capital of China.

The preset relationship field information may be preset by a worker or a user.

Since the relationship data between the nodes is used to represent the relationship between the nodes, the preset relationship field information can be used to reflect the relationship between the field values in which the relationship exists. Therefore, the relationship data between the nodes corresponding to each structured data can be generated according to the preset relationship field information.

Specifically, the relationship between the nodes corresponding to each structured data can be obtained according to the field value and the relationship information in the preset relationship field information, and the relationship data between the nodes corresponding to each structured data is generated according to the obtained relationship.

S204: files containing node data and relationship data are obtained and imported into a graph database.

Since a graph database is a database that stores relational data. And the relational data stored in the graph database includes: node data for each node and relationship data representing relationships between each node. Therefore, based on the node data and the relationship data generated in S202 and S203, a file including the generated node data and relationship data can be obtained and imported into the graph database.

When the file is imported into the graph database, a file import interface of the graph database can be called, so that the file is imported into the graph database. Files may also be imported into the graph database based on the cypher language or a third party plug-in method based on the programming language.

In an embodiment of the present invention, the obtaining of the file containing the node data and the relationship data in S204 may be implemented as follows.

And generating a file containing the node data and the relation data according to a preset file format requirement, and storing the file according to a preset file storage requirement.

Specifically, the preset file format requirement and the preset file storage requirement may be preset by a worker or a user. The preset file format requirements specify the generated file format, for example: the file format may be PDF, XML, etc. The preset file storage requirement specifies a storage manner of the generated file format, for example: the storage mode may include storing according to a preset storage path, storing at a preset position, and the like.

For example: assuming that the preset file format requirement specifies that the file format is an XML format, the preset file storage requirement specifies that the generated file is stored according to the storage path 1. A file in XML format containing the generated node data and the relationship data may be generated and stored according to the storage path 1.

Therefore, the file is generated according to the preset file format requirement and the preset file storage requirement, so that the obtained file meets the file format requirement and the file storage requirement, and the file can be led into the graph database.

As can be seen from the above, when the scheme provided by this embodiment is applied to data processing, the identifier of the structured data to be processed is taken as the node identifier, and the field value included in the structured data is taken as the node attribute, so as to generate node data of a node corresponding to each structured data; and generating relationship data between nodes corresponding to each structured data according to the relationship field values in the preset relationship field information and the relationship information among the relationship field values, and importing the file containing the generated node data and the relationship data into the graph database. Compared with the prior art, the method has the advantages that the files do not need to be manually generated by workers, the workload of generating the files is reduced, the efficiency of data import can be improved, and the efficiency of data processing is improved.

In addition, since the relational data stored in the graph data includes node data of each node, the node data of the node includes: node identification and node attributes. Therefore, the node data of the node corresponding to each piece of structured data can be generated more accurately by taking the identifier of the structured data as the node identifier and the field value contained in the structured data as the node attribute. Further, because the relational data stored in the graph database further includes relational data representing relationships between the respective nodes, that is, the relational data is used to represent relationships between the nodes, and the preset relationship field information includes: the structured data comprises the field values with the relationship and the relationship information among the field values with the relationship, wherein the relationship information is used for representing the relationship among the field values with the relationship, so that the relationship data among the nodes corresponding to the structured data can be accurately generated according to the preset relationship field information. Therefore, the obtained file containing the generated node data and the relationship data is the relationship type data which can be stored in the graph data, the data importing efficiency is improved, and the data processing efficiency is further improved.

Referring to fig. 3, fig. 3 is a flowchart illustrating a second data processing method according to an embodiment of the present invention, where S201 may include S201A-S201B.

S201A: structured data are obtained from a preset data source, and data cleaning is carried out on the obtained structured data.

The preset data source may be a data source preset by a user or a worker.

Since the data source stores the information of the structured database, the structured database can be determined according to the information of the structured database in the data source, and the structured data can be obtained from the determined structured database. Specifically, the data source may be determined according to a preset data source name.

Since the obtained structured data may contain invalid data, duplicate data, and the like, the obtained structured data may be subjected to data cleansing. The above data cleaning method may also be referred to as a data preprocessing method.

In an embodiment of the present invention, the data cleansing may include: data deduplication and field merging.

Specifically, the data deduplication is as follows: and deleting the data which is duplicated in the obtained structured data. The fields are merged into: and combining the field values belonging to the same field in the obtained structured data.

S201B: and mapping field identifiers corresponding to field values in the cleaned structured data into attribute identifiers corresponding to node attributes based on a preset identifier mapping relation, and taking the structured data after mapping the identifiers as the structured data to be processed.

In this case, the data import may fail because the field id corresponding to the field value in the structured data may not be consistent with the attribute id corresponding to the node attribute. Therefore, the field identifier corresponding to the field value in the cleaned structured data can be mapped to the attribute identifier corresponding to the node attribute.

The preset identifier mapping relationship may be an identifier mapping relationship preset by a worker or a user.

Specifically, field identifiers corresponding to field values in the cleaned structured data can be obtained, and each field identifier is mapped to each attribute identifier based on a preset identifier mapping relationship.

For example: assume that the preset identifier mapping relationship is shown in table 2.

TABLE 2

Name (I) Name
Age (age) Age

In table 2, each data in the left column represents a field identification, and each data in the right column represents an attribute identification. As can be seen from table 2, the preset mapping relationship of the identifiers is: "Name", "Age-Age".

The post-wash structured data obtained are shown in table 3.

TABLE 3

Name (I) Age (age)
Xiaoming liquor 5
Xiao Hong 6

In table 3, "name, age" is a field identification, "xiaoming, reddish" is a field value of a field identification as a name, and "5, 6" are field values of a field identification as an age.

Based on the preset identifier mapping relationship, the mapped structured data is shown in table 4.

TABLE 4

Name Age
Xiaoming liquor 5
Xiao Hong 6

In table 4, "Name, Age" is the mapped field identification, "small bright, small red" is the field value with field identification Name, and "5, 6" are the field values with field identification Age.

In this way, the structured data obtained from the preset data source is subjected to data cleaning, and data such as invalid data and repeated data in the obtained structured data can be eliminated. And based on a preset identifier mapping relation, mapping field identifiers corresponding to field values in the cleaned structured data into attribute identifiers corresponding to node attributes, so that the success rate of data import can be improved, and the efficiency of data processing is improved.

In an embodiment of the present invention, the structured data identified by the mapping in S201a2 described above may be implemented as the structured data to be processed in the following manner.

And according to a preset field value format, carrying out format adjustment on the field value in the structured data after mapping identification, and taking the adjusted structured data as the structured data to be processed.

The preset field value format may be preset by a worker or a user. Specifically, the field value format may specify a time format, a font size format, and the like.

Because the format of the field value in the structured data after mapping identification may not be uniform or normative, the format of the field value in the structured data after mapping identification can be adjusted according to the preset field value format, so that the success rate of data import can be improved, and the efficiency of data processing is improved.

Referring to fig. 4, fig. 4 is a flowchart illustrating a third data processing method according to an embodiment of the present invention, where S203 includes S203A-S203B.

S203A: and determining the node of the node attribute which comprises the field value in the preset relation field information.

Since in the above S202, the node data of the generated node includes the node identifier and the node attribute, wherein the node attribute is determined according to the field value included in the structured data. And the preset relationship field information comprises the field value of the relationship in the structured data, so that the node of which the node attribute comprises the field value in the preset relationship field information can be determined.

Specifically, according to a field value in the preset relationship field information, a node attribute that is the same as the field value may be searched for from among the node attributes generated in S202, and this search process may also be referred to as feature matching. And determining the node corresponding to the node attribute according to the searched node attribute. The process of determining the above nodes may also be referred to as entity linking.

For example: suppose there are three nodes, the node attribute of the first node is china, the node attribute of the second node is beijing, and the node attribute of the third node is shanghai. The preset relationship field information comprises the following field values: china and beijing. Since the node attribute of the node 1 includes china in the above field value, the node attribute of the node 2 includes beijing in the above field value. Thus, the first node and the second node can be determined.

S203B: and generating the relation data between the determined nodes according to the relation information in the preset relation field information.

Because the relationship data between the nodes is used for representing the relationship between the nodes, and the preset relationship field information can be used for reflecting the relationship between the field values with the relationship, the relationship data between the nodes corresponding to each piece of structured data can be generated according to the preset relationship field information.

Specifically, after the nodes are determined in S203A, the relationship between the determined nodes may be determined based on the relationship information in the preset relationship field information, and the relationship data may be generated based on the determined relationship.

For example: assuming that the relationship information in the preset relationship field information is: beijing is the capital of China, and according to the relationship information, the relationship between the node a with the node attribute of Beijing and the node b with the node attribute of China can be determined, so that the relationship data between the node a and the node b can be generated.

In this way, the node attribute is determined according to the field value contained in the structured data, and the preset relationship field information includes the field value having the relationship in the structured data, so that the node having the field value in the preset relationship field information in the node attribute can be more accurately determined. In addition, since the relationship information in the preset relationship field information can indicate that the relationship between the relationship field values exists, the relationship data between the nodes can be generated more accurately.

The scheme provided by the embodiment of the invention is specifically explained by a specific embodiment. Referring to fig. 5, fig. 5 is a schematic flowchart of a fourth data processing method according to an embodiment of the present invention. S501-S507 are included in FIG. 5.

S501: structured data to be processed is obtained.

Receiving data processing parameter information preset by a user or a worker, wherein the data processing parameter information comprises: database links, names of data sources, data storage paths, etc. The data source can be determined according to the database link, the structured data to be processed can be obtained from the data source, the data source can be determined according to the name of the data source, the structured data to be processed can be obtained from the data source, and the structured data to be processed can be obtained according to the data storage path.

S502: and preprocessing the structured data to be processed.

The pretreatment may include: data deduplication, field merging, etc.

S503: a mapping is identified.

And mapping field identifications corresponding to field values in the preprocessed structured data into attribute identifications corresponding to node attributes based on a preset identification mapping relation.

S504: node data of the nodes is generated.

And generating node data of the node corresponding to each structured data by taking the identifier of the structured data as the node identifier and taking the field value contained in the structured data as the node attribute.

S505: and generating relation data among the nodes.

And generating the relation data between the nodes corresponding to each structured data according to the preset relation field information and the entity link mode.

S506: and carrying out post-processing on the generated node data and the relationship data.

Wherein the post-processing may include: generating a file containing the generated node data and the relationship data, generating a file containing the generated node data and the relationship data according to a preset file format requirement, storing the file according to a preset file storage requirement, and the like.

S507: and importing the processed data into a graph database.

Corresponding to the data processing method, the embodiment of the invention also provides a data processing device.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a first data processing apparatus according to an embodiment of the present invention, where the apparatus includes 601 and 604.

A structured data obtaining module 601, configured to obtain structured data to be processed;

a node data generating module 602, configured to generate node data of a node corresponding to each piece of structured data by using the identifier of the piece of structured data as a node identifier and using a field value included in the piece of structured data as a node attribute;

a relationship data generating module 603, configured to generate relationship data between nodes corresponding to each piece of structured data according to preset relationship field information, where the preset relationship field information includes: the method comprises the steps of structuring data, wherein the field values of relations exist in the data, and relation information among the field values of relations exists;

a data importing module 604, configured to obtain a file including the node data and the relationship data, and import the file into a graph database.

As can be seen from the above, when the scheme provided by this embodiment is applied to data processing, the identifier of the structured data to be processed is taken as the node identifier, and the field value included in the structured data is taken as the node attribute, so as to generate node data of a node corresponding to each structured data; and generating relationship data between nodes corresponding to each structured data according to the relationship field values in the preset relationship field information and the relationship information among the relationship field values, and importing the file containing the generated node data and the relationship data into the graph database. Compared with the prior art, the method has the advantages that the files do not need to be manually generated by workers, the workload of generating the files is reduced, the efficiency of data import can be improved, and the efficiency of data processing is improved.

In addition, since the relational data stored in the graph data includes node data of each node, the node data of the node includes: node identification and node attributes. Therefore, the node data of the node corresponding to each piece of structured data can be generated more accurately by taking the identifier of the structured data as the node identifier and the field value contained in the structured data as the node attribute. Further, because the relational data stored in the graph database further includes relational data representing relationships between the respective nodes, that is, the relational data is used to represent relationships between the nodes, and the preset relationship field information includes: the structured data comprises the field values with the relationship and the relationship information among the field values with the relationship, wherein the relationship information is used for representing the relationship among the field values with the relationship, so that the relationship data among the nodes corresponding to the structured data can be accurately generated according to the preset relationship field information. Therefore, the obtained file containing the generated node data and the relationship data is the relationship type data which can be stored in the graph data, the data importing efficiency is improved, and the data processing efficiency is further improved.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a second data processing apparatus according to an embodiment of the present invention, where the relational data generation module 603 includes 603A to 603B.

The node determining submodule 603A is configured to determine a node of the node attribute that includes the field value in the preset relationship field information;

the relationship data generating sub-module 603B is configured to generate relationship data between the determined nodes according to the relationship information in the preset relationship field information.

In this way, the node attribute is determined according to the field value contained in the structured data, and the preset relationship field information includes the field value having the relationship in the structured data, so that the node having the field value in the preset relationship field information in the node attribute can be more accurately determined. In addition, since the relationship information in the preset relationship field information can indicate that the relationship between the relationship field values exists, the relationship data between the nodes can be generated more accurately.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a third data processing apparatus according to an embodiment of the present invention, where the structured data obtaining module 601 includes 601A-601B.

The data cleaning sub-module 601A is configured to obtain structured data from a preset data source, and perform data cleaning on the obtained structured data;

the identifier mapping sub-module 601B is configured to map, based on a preset identifier mapping relationship, a field identifier corresponding to a field value in the cleaned structured data to an attribute identifier corresponding to a node attribute, and use the structured data after mapping the identifier as structured data to be processed.

In this way, the structured data obtained from the preset data source is subjected to data cleaning, and data such as invalid data and repeated data in the obtained structured data can be eliminated. And based on a preset identifier mapping relation, mapping field identifiers corresponding to field values in the cleaned structured data into attribute identifiers corresponding to node attributes, so that the success rate of data import can be improved, and the efficiency of data processing is improved.

In an embodiment of the present invention, the identifier mapping sub-module is specifically configured to map, based on a preset identifier mapping relationship, a field identifier corresponding to a field value in the cleaned structured data to be an attribute identifier corresponding to a node attribute, perform format adjustment on the field value in the structured data after the mapping is performed according to a preset field value format, and use the adjusted structured data as the structured data to be processed.

Because the format of the field value in the structured data after mapping identification may not be uniform or normative, the format of the field value in the structured data after mapping identification can be adjusted according to the preset field value format, so that the success rate of data import can be improved, and the efficiency of data processing is improved.

In an embodiment of the present invention, the data processing module is specifically configured to generate a file including the node data and the relationship data according to a preset file format requirement, store the file according to a preset file storage requirement, and import the file into a graph database.

Therefore, the file is generated according to the preset file format requirement and the preset file storage requirement, so that the obtained file meets the file format requirement and the file storage requirement, and the file can be led into the graph database.

Corresponding to the data processing method, the embodiment of the invention also provides electronic equipment.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, including a processor 901, a communication interface 902, a memory 903, and a communication bus 904, where the processor 901, the communication interface 902, and the memory 903 are configured to communicate with each other through the communication bus 904,

a memory 903 for storing computer programs;

the processor 901 is configured to implement the data processing method provided in the embodiment of the present invention when executing the program stored in the memory 903.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In another embodiment provided by the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the data processing method provided by the embodiment of the present invention.

In another embodiment provided by the present invention, a computer program product containing instructions is also provided, which when executed on a computer causes the computer to implement the data processing method provided by the embodiment of the present invention.

As can be seen from the above, when the scheme provided by this embodiment is applied to data processing, the identifier of the structured data to be processed is taken as the node identifier, and the field value included in the structured data is taken as the node attribute, so as to generate node data of a node corresponding to each structured data; and generating relationship data between nodes corresponding to each structured data according to the relationship field values in the preset relationship field information and the relationship information among the relationship field values, and importing the file containing the generated node data and the relationship data into the graph database. Compared with the prior art, the method has the advantages that the files do not need to be manually generated by workers, the workload of generating the files is reduced, the efficiency of data import can be improved, and the efficiency of data processing is improved.

In addition, since the relational data stored in the graph data includes node data of each node, the node data of the node includes: node identification and node attributes. Therefore, the node data of the node corresponding to each piece of structured data can be generated more accurately by taking the identifier of the structured data as the node identifier and the field value contained in the structured data as the node attribute. Further, because the relational data stored in the graph database further includes relational data representing relationships between the respective nodes, that is, the relational data is used to represent relationships between the nodes, and the preset relationship field information includes: the structured data comprises the field values with the relationship and the relationship information among the field values with the relationship, wherein the relationship information is used for representing the relationship among the field values with the relationship, so that the relationship data among the nodes corresponding to the structured data can be accurately generated according to the preset relationship field information. Therefore, the obtained file containing the generated node data and the relationship data is the relationship type data which can be stored in the graph data, the data importing efficiency is improved, and the data processing efficiency is further improved.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the apparatus, the electronic device, and the computer-readable storage medium, since they are substantially similar to the embodiments of the method, the description is simple, and for the relevant points, reference may be made to the partial description of the embodiments of the method.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

20页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于局部天区误差校正的面向星表数据的时序重构方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!