External key mapping method and device of database table, electronic equipment and storage medium

文档序号:1170285 发布日期:2020-09-18 浏览:8次 中文

阅读说明:本技术 数据库表的外键映射方法、装置、电子设备和存储介质 (External key mapping method and device of database table, electronic equipment and storage medium ) 是由 袁鹏文 刘强 胡婧 于 2020-04-28 设计创作,主要内容包括:本申请公开了数据库表的外键映射方法、装置、电子设备和存储介质,所述方法包括:获取目标字段的字段信息;对所述字段信息进行自然语言处理,得到所述字段信息的文本特征;根据关联对象分类模型和所述文本特征,确定所述目标字段的关联对象;建立所述目标字段与所述关联对象的数据库表的外键映射关系。通过本申请,解决了由于依赖人工进行数据库表的外键映射导致成本较高且映射效率不高的问题,实现了数据库表的外键自动匹配和映射,提高了外键映射效率和准确率。(The application discloses a method and a device for mapping external keys of a database table, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring field information of a target field; carrying out natural language processing on the field information to obtain text characteristics of the field information; determining the associated object of the target field according to the associated object classification model and the text characteristics; and establishing a foreign key mapping relation between the target field and the database table of the associated object. By the method and the device, the problems of high cost and low mapping efficiency caused by manually mapping the foreign key of the database table are solved, automatic matching and mapping of the foreign key of the database table are realized, and the foreign key mapping efficiency and accuracy are improved.)

1. A method for foreign key mapping of a database table, comprising:

acquiring field information of a target field;

carrying out natural language processing on the field information to obtain text characteristics of the field information;

determining the associated object of the target field according to the associated object classification model and the text characteristics;

and establishing a foreign key mapping relation between the target field and the database table of the associated object.

2. The method of claim 1, wherein the natural language processing the field information to obtain textual features of the field information comprises:

performing word segmentation processing on the field information to obtain a word segmentation result;

and extracting a characteristic value in the word segmentation result, and determining the text characteristic of the field information according to the characteristic value.

3. The foreign key mapping method of a database table according to claim 2, wherein the field information includes a field name, and the performing a word segmentation process on the field information to obtain a word segmentation result includes:

performing word segmentation processing on the field names according to a preset field naming format to obtain a plurality of words;

the extracting the feature values in the word segmentation result comprises:

and respectively extracting the characteristic values of the obtained words.

4. The method of claim 1, wherein the target field is a dimension field of a fact table in a Hadoop data warehouse, and the associated object is an entity of a dimension table in the Hadoop data warehouse.

5. The method of foreign key mapping for a database table as in claim 1, wherein the associative object classification model comprises a decision tree classification model.

6. The method of claim 1, wherein the associated objects include store entities, project entities, and sales entities, and wherein determining the associated object for the target field based on the associated object classification model and the textual features comprises:

if the associated object of the target field is determined to be an entity, determining whether the entity is the store entity;

if the entity is not the store entity, determining whether the entity is the project entity;

if the entity is not the project entity, determining whether the entity is the selling entity.

7. A method of foreign key mapping of a database table according to any of claims 1 to 6, characterized in that the method further comprises:

acquiring a data test request, wherein the data test request comprises a field to be tested;

reading the data in the field to be tested and the foreign key mapping relation of the field to be tested according to the data test request;

and determining a target database table according to the foreign key mapping relation, and if the read data exists in the target database table, passing the test.

8. An apparatus for foreign key mapping of a database table, comprising:

the first acquisition unit is used for acquiring field information of the target field;

the characteristic extraction unit is used for carrying out natural language processing on the field information to obtain text characteristics of the field information;

the determining unit is used for determining the associated object of the target field according to the associated object classification model and the text characteristics;

and the establishing unit is used for establishing the foreign key mapping relation between the target field and the database table of the associated object.

9. An electronic device, wherein the electronic device comprises: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the method of any of claims 1 to 7.

10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-7.

Technical Field

The application relates to the technical field of machine learning, in particular to a foreign key mapping method and device of a database table, electronic equipment and a storage medium.

Background

In a Hadoop (a distributed system infrastructure) based data warehouse, the dimension fields of fact tables are numerous. During data testing, foreign key consistency is a key point of data testing, but as Hadoop has no main foreign key relationship, the fact table foreign key mapping of the data warehouse becomes a difficult problem.

In the current data testing link, the fact table foreign key mapping usually needs manual maintenance one by one, which wastes time and labor; and dimension fields in the fact table are easily and tightly coupled with services, so that the same entity can be mapped with different dimension field names and cannot be judged by using simple rules.

Disclosure of Invention

In view of the above, the present application is proposed to provide a method, an apparatus, an electronic device and a storage medium for mapping foreign keys of a database table that overcome or at least partially solve the above problems.

According to a first aspect of the application, a method for mapping external keys of a database table is provided, which comprises the following steps:

acquiring field information of a target field;

carrying out natural language processing on the field information to obtain text characteristics of the field information;

determining the associated object of the target field according to the associated object classification model and the text characteristics;

and establishing a foreign key mapping relation between the target field and the database table of the associated object.

Optionally, the performing natural language processing on the field information to obtain the text feature of the field information includes:

performing word segmentation processing on the field information to obtain a word segmentation result;

and extracting a characteristic value in the word segmentation result, and determining the text characteristic of the field information according to the characteristic value.

Optionally, the field information includes a field name, and performing word segmentation processing on the field information to obtain a word segmentation result includes:

performing word segmentation processing on the field names according to a preset field naming format to obtain a plurality of words;

the extracting the feature values in the word segmentation result comprises:

and respectively extracting the characteristic values of the obtained words.

Optionally, the target field is a dimension field of a fact table in a Hadoop data warehouse, and the associated object is an entity of a dimension table in the Hadoop data warehouse.

Optionally, the associated object classification model comprises a decision tree classification model.

Optionally, the associated objects include store entities, project entities and sales entities, and determining the associated object of the target field according to the associated object classification model and the text feature includes:

if the associated object of the target field is determined to be an entity, determining whether the entity is the store entity;

if the entity is not the store entity, determining whether the entity is the project entity;

if the entity is not the project entity, determining whether the entity is the selling entity.

Optionally, the method further comprises:

acquiring a data test request, wherein the data test request comprises a field to be tested;

reading the data in the field to be tested and the foreign key mapping relation of the field to be tested according to the data test request;

and determining a target database table according to the foreign key mapping relation, and if the read data exists in the target database table, passing the test.

According to a second aspect of the present application, there is provided a foreign key mapping apparatus for a database table, comprising:

the first acquisition unit is used for acquiring field information of the target field;

the characteristic extraction unit is used for carrying out natural language processing on the field information to obtain text characteristics of the field information;

the determining unit is used for determining the associated object of the target field according to the associated object classification model and the text characteristics;

and the establishing unit is used for establishing the foreign key mapping relation between the target field and the database table of the associated object.

Optionally, the feature extraction unit is further configured to:

performing word segmentation processing on the field information to obtain a word segmentation result;

and extracting a characteristic value in the word segmentation result, and determining the text characteristic of the field information according to the characteristic value.

Optionally, the field information includes a field name, and the feature extraction unit is further configured to:

performing word segmentation processing on the field names according to a preset field naming format to obtain a plurality of words;

and respectively extracting the characteristic values of the obtained words.

Optionally, the target field is a dimension field of a fact table in a Hadoop data warehouse, and the associated object is an entity of a dimension table in the Hadoop data warehouse.

Optionally, the associated object classification model comprises a decision tree classification model.

Optionally, the associated object includes a store entity, a project entity and a sales entity, and the determining unit is further configured to:

if the associated object of the target field is determined to be an entity, determining whether the entity is the store entity;

if the entity is not the store entity, determining whether the entity is the project entity;

if the entity is not the project entity, determining whether the entity is the selling entity.

Optionally, the apparatus further comprises:

the second acquisition unit is used for acquiring a data test request, and the data test request comprises a field to be tested;

the reading unit is used for reading the data in the field to be tested and the foreign key mapping relation of the field to be tested according to the data test request;

and the test unit is used for determining a target database table according to the foreign key mapping relation, and if the read data exists in the target database table, the test is passed.

In accordance with a third aspect of the present application, there is provided an electronic device comprising: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method as any one of the above.

According to a fourth aspect of the application, there is provided a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement a method as in any above.

According to the technical scheme, the field information of the target field is acquired; carrying out natural language processing on the field information to obtain text characteristics of the field information; determining the mode of the associated object of the target field according to the associated object classification model and the text characteristics; and establishing the foreign key mapping relation between the target field and the database table of the associated object, solving the problems of higher cost and low mapping efficiency caused by manually mapping the foreign key of the database table, realizing the automatic matching and mapping process of the foreign key of the database table and improving the efficiency and accuracy of the foreign key mapping.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a flow diagram of a method for foreign key mapping of a database table according to one embodiment of the present application;

FIG. 2 illustrates a block flow diagram of a method for foreign key mapping of a database table according to one embodiment of the present application;

FIG. 3 illustrates a logical representation of a decision tree classification model according to one embodiment of the present application;

FIG. 4 is a block diagram of an apparatus for foreign key mapping of database tables according to one embodiment of the present application;

FIG. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 6 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Hadoop is a framework that allows large amounts of data to be processed in a distributed manner in a computer cluster using a simple programming model. Users can develop distributed programs without knowing details of a distributed bottom layer, and high-speed operation and storage are performed by fully utilizing the power of the cluster.

However, in the Hadoop data warehouse, the mapping between the fact tables and the dimension tables in the data warehouse becomes a difficult problem due to the no-main foreign key mapping relationship. The currently adopted method relies on manual work to perform the foreign key mapping and maintenance of the fact table and the dimension table, and the method is high in cost and low in mapping efficiency. In addition, because the dimension fields related in the fact table are easily tightly coupled with the service, different dimension field names can be mapped by the same entity, and further the difficulty of manual mapping is increased.

Based on this, an embodiment of the present application provides a method for mapping external keys of a database table, as shown in fig. 1, the method includes the following steps S110 to S140:

step S110, field information of the target field is acquired.

Before establishing the foreign key mapping relationship, the database table may first determine whether the field is a target field that needs to be subjected to foreign key mapping, and if the field is the target field, obtain field information corresponding to the target field to be used as a basis for subsequent foreign key mapping, where the field information of the target field may refer to a field name, for example, a sales service database table may include fields such as sales staff, sales volume, sales amount, and the like, it may be determined that the sales staff is the target field that needs to be subjected to foreign key mapping, and then obtain field information corresponding to the sales staff field to be subjected to foreign key mapping.

And step S120, carrying out natural language processing on the field information to obtain the text characteristics of the field information.

After the field information of the target field is obtained, Natural Language Processing (NLP) needs to be performed on the field information, and the natural language Processing can accurately extract text features in the field information by using methods such as semantic analysis and word segmentation, so as to provide a data base for subsequent model analysis.

For example, a Chinese text is formally a string of Chinese characters (including punctuation marks, etc.). Words can be composed by words, phrases can be composed by words, sentences can be composed by phrases, and further, sentences can be composed by sentences. At whatever level above: the characters (characters), words, phrases, sentences, segments, or the next level to the previous level may have ambiguity and ambiguity phenomena, that is, a segment of character string with the same form can be understood as different word strings, word group strings, etc. and have different meanings in different scenes or different contexts. Similarly, the field information in the database table is also composed of text, which also has different meanings due to different scenes or contexts. Therefore, through accurate analysis of the semantics and the context of the text, more accurate text features can be extracted from the text to represent the true meaning of the text in the scene as much as possible.

Step S130, determining the associated object of the target field according to the associated object classification model and the text characteristics.

In the embodiment of the application, an associated object classification model is pre-constructed, and based on an existing classification model framework such as a decision tree classification model, a bayesian classification model, an SVM (Support Vector Machine), a KNN (K-Nearest Neighbor, K Neighbor classification algorithm) and the like, the labeled field classification data is trained and tested, so that the associated object classification model is obtained.

Step S140, establishing a foreign key mapping relationship between the target field and the database table of the associated object.

After the associated object corresponding to the target field is judged through the classification model, the foreign key mapping relation between the target field and the associated object is established, and then the automatic matching and mapping process of the field and the foreign key in the database table is realized. The process does not need manual intervention, reduces the cost of manual operation, simultaneously avoids the problem of low mapping efficiency and accuracy caused by manually mapping the foreign key of the database table, and improves the efficiency and accuracy of mapping the foreign key of the database table.

As shown in fig. 2, a flow chart of a foreign key mapping method of a database table is provided, which includes obtaining a target field, performing word segmentation on field information of the target field to obtain four parts, namely a word a, a word B, a word C, and a word D, performing feature matching on a segmentation result to obtain text features corresponding to the target field, inputting the feature values into a decision tree classification model for classification, sequentially judging whether each entity is matched with the target field, directly outputting the entity if the entity is matched, and judging a next entity if the entity is not matched until an entity dimension corresponding to the target field is determined, thereby realizing automatic establishment of a mapping relationship between the target field and the entity. The entity in the embodiment of the application can refer to an object which is involved in a business scene such as a store, a project, an employee and the like and can be used as a dimension to establish a dimension table.

In an embodiment of the application, the performing natural language processing on the field information to obtain a text feature of the field information includes: performing word segmentation processing on the field information to obtain a word segmentation result; and extracting a characteristic value in the word segmentation result, and determining the text characteristic of the field information according to the characteristic value.

In the embodiment of the application, a word segmentation method is adopted to perform natural language processing on the obtained field information, and word segmentation refers to a process of recombining continuous character sequences into word sequences according to a certain specification. The word segmentation techniques commonly used in NLP can be divided into three categories: the method includes a word segmentation method based on string matching (such as a forward maximum matching method, a reverse maximum matching method, and the like), a word segmentation method based on statistics (such as Hidden Markov Models (HMMs), maximum entropy models, Conditional Random field models (CRFs)), and a word segmentation method based on understanding.

And then extracting a characteristic value according to the word segmentation result, and determining text characteristics corresponding to the field information according to the characteristic value to be used as the input of a subsequent classification model. The text features may be formed by a set of feature sets, for example, after the field information is subjected to word segmentation processing, a plurality of word segmentation results may be obtained, feature extraction is performed on each word segmentation result, a set of feature sets corresponding to the field information is obtained, and the feature sets are input as a whole into a subsequent classification model for classification judgment. Of course, the form of the text feature may be any other type known to those skilled in the art, and is not particularly limited herein.

In an embodiment of the present application, the field information includes a field name, and performing a word segmentation process on the field information to obtain a word segmentation result includes: performing word segmentation processing on the field names according to a preset field naming format to obtain a plurality of words; the extracting the feature values in the word segmentation result comprises: and respectively extracting the characteristic values of the obtained words.

The field information of the embodiment of the application can comprise field names, and the field names are subjected to word segmentation processing by utilizing a word segmentation technology, so that a plurality of word segmentation results are obtained. For example, the field names are divided into four parts, namely a word a, a word B, a word C and a word D, by using a word segmentation technique, and then the feature value of each word is calculated to be used as the input of a subsequent classification model.

The preset field naming format in the embodiment of the present application can be flexibly set according to actual requirements, for example, the field name can generally be composed of letters, numbers, underlines, Chinese characters, etc., and the naming rule of the field name is set as follows: 1) named in lowercase English; 2) the words are connected in a _manner, then, taking "referred to as store ID" in table 1 below as an example, the field name corresponding to the word is "bonus _ poi _ ID", when performing word segmentation processing, word segmentation can be performed in a naming format of the field, for example, the _manneris used as a word segmentation mark, the obtained word segmentation result is "bonus", "poi" and "ID", the three word segments are converted into feature vectors respectively and then input into a classification model for classification judgment, and further, an associated object corresponding to the field name can be obtained, it can be found that the feature value corresponding to the word "poi" is matched with the store entity dimension, and then the mapping relationship between the field name and the store entity dimension is established. By analogy, the mapping of all field names and entity dimensions is completed, and the specific mapping result is shown in the following table 1.

TABLE 1

In one embodiment of the application, the target field is a dimension field of a fact table in a Hadoop data warehouse, and the associated object is an entity of a dimension table in the Hadoop data warehouse.

The target field in the embodiment of the application may be a dimension field of a fact table in a Hadoop data warehouse, and the associated object may be an entity of a dimension table in the Hadoop data warehouse. Each data warehouse contains one or more fact data tables. The fact data table may contain business sales data such as data generated by cash register transactions, the fact data table usually contains a large number of rows, and is mainly characterized by containing numerical data (facts), and the numerical information can be summarized to provide data about units as a history, each fact data table contains an index composed of a plurality of parts, the index contains a main key of a relevance dimension table as a foreign key, and the dimension table contains the characteristics of a fact record. The fact data table should not contain descriptive information nor should it contain any data other than the numeric metric field and the associated index field that associates the fact with the corresponding entry in the dimension table. The dimension table may be viewed as a window for a user to analyze data, including properties of fact records in the fact data table, some properties providing descriptive information, some properties specifying how to aggregate the fact data table data to provide useful information to the analyst, and a hierarchy of properties that help aggregate the data.

Similarly, taking the sales service database table as an example, assuming that one of the data tables a includes data fields such as store names, sales times, sales volumes, and sales amounts, and the other data table B includes data fields such as store names, cities, and areas, it can be determined that the data table a is a fact table, the data table B is a dimension table, and the store names are field names that need to be mapped by external keys.

The fact table and the dimension table are difficult to map by the external keys due to the fact that the Hadoop data warehouse does not have the mapping relation of the main external keys, and the problem that the Hadoop data warehouse lacks the mapping relation of the main external keys can be solved through the processes of word segmentation processing, model classification and mapping relation establishment, so that the fact table and the dimension table are automatically matched and established.

In one embodiment of the present application, the associated object classification model comprises a decision tree classification model.

The mapping between the target field and the associated object can be substantially classified into a classification problem, and there are various classification algorithms in the Machine learning field, including a decision tree classification model, a bayesian classification model, a KNN (K-Nearest Neighbor, K-Nearest Neighbor classification algorithm), an SVM (Support Vector Machine) algorithm, and the like.

A decision tree algorithm is a method of approximating discrete function values. It is a typical classification method that first processes the data, generates readable rules and decision trees using a generalisation algorithm, and then uses the decisions to analyze the new data. The Bayes classification model is an irregular classification method, which is to train the classified sample subset, learn and induce the classification function, and use the classifier obtained by training to realize the classification of the unclassified data. A typical bayesian classification algorithm is the Naive Bayes classification algorithm (Naive Bayes). KNN is one of the simplest machine learning algorithms, can be used for classification and regression, and is a supervised learning algorithm. The idea is that if most of the K most similar (i.e. nearest neighbor in the feature space) samples in the feature space belong to a certain class, then the sample also belongs to this class. That is, the method only determines the category to which the sample to be classified belongs according to the category of the nearest sample or samples in the classification decision. The SVM is a generalized linear classifier for binary classification of data in a supervised learning mode, and the classifier has good robustness and sparsity.

Taking a sales business scenario as an example, the dimension table of the database table in the scenario may include entity dimensions, and the entity dimensions may be further divided into store entity dimensions, project entity dimensions, and employee entity dimensions. Three commonly used classification algorithms, namely decision tree, Bayes and SVM, are selected for training and testing respectively, and the test results are shown in the following table 2.

TABLE 2

It can be found that under the condition of the same data and the same characteristics, the effect of the decision tree classification model is the best, and the overall accuracy can reach 98.8%, so that the decision tree classification model is selected to realize the classification of the target field, and the better effect can be obtained. Of course, a person skilled in the art may select other types of classification models according to actual situations, and the scope of protection of the present application should not be limited thereto.

In one embodiment of the present application, the association objects include store entities, project entities and sales entities, and the determining the association object of the target field according to the association object classification model and the text feature includes: if the associated object of the target field is determined to be an entity, determining whether the entity is the store entity; if the entity is not the store entity, determining whether the entity is the project entity; if the entity is not the project entity, determining whether the entity is the selling entity.

A classification decision tree is a tree structure that describes the classification of instances. The decision tree consists of nodes (nodes) and edges (directed edges). There are two types of nodes: an internal node (internal node) and a leaf node (leaf node). An internal node represents a feature or attribute and a leaf node represents a classification. Testing a certain characteristic of the example from the root node by utilizing decision tree classification, and distributing the example to the child nodes according to the test result; each child node corresponds to a value of the feature. Such recursive downward depth traversal until a leaf node is reached, and finally assigning the instance to the class of the leaf node.

Based on the principle of a classification decision tree, in combination with a specific service scenario, the associated object of the embodiment of the present application may include an entity, and the entity may further be divided into multiple categories, such as a store entity, a project entity, and a sales entity, as shown in fig. 3, when classifying a target field, first determining whether the target field is an entity dimension, if not, stopping matching of all child nodes in the entity dimension, if so, determining whether the target field is the store entity, if so, outputting the associated object of the target field as the store entity, if not, continuing determining whether the target field is the project entity, if so, outputting the associated object of the target field as the project entity, if not, continuing determining whether the target field is the sales entity, if so, and outputting the associated object of the target field as a selling entity, and if not, continuing to judge other child nodes until determining the entity corresponding to the target field.

In one embodiment of the present application, the method further comprises: acquiring a data test request, wherein the data test request comprises a field to be tested; reading the data in the field to be tested and the foreign key mapping relation of the field to be tested according to the data test request; and determining a target database table according to the foreign key mapping relation, and if the read data exists in the target database table, passing the test.

The foreign key mapping is one of key means for ensuring data consistency, and the accuracy of the foreign key mapping result can be verified through a data testing link. The method comprises the steps of obtaining a data test request, determining a field to be tested, reading data in the field and a foreign key mapping relation corresponding to the field, calling a corresponding database table according to the foreign key mapping relation, judging whether the data in the field to be tested exist in the database table, and if the data exist, through data test, showing that a foreign key mapping result is accurate, so that data consistency can be ensured.

According to a second aspect of the present application, as shown in fig. 4, there is provided an apparatus 400 for foreign key mapping of a database table, the apparatus 400 comprising: a first obtaining unit 410, a feature extracting unit 420, a determining unit 430 and a establishing unit 440.

The first obtaining unit 410 of the embodiment of the present application is configured to obtain field information of a target field.

Before establishing the foreign key mapping relationship, the database table may first determine whether the field is a target field that needs to be subjected to foreign key mapping, and if so, acquire field information corresponding to the target field to be used as a basis for subsequent foreign key mapping, where the field information of the target field may refer to a field name, for example, a sales service database table may include fields such as sales staff, sales volume, sales amount, and the like, and then determine that the sales staff is the target field that needs to be subjected to foreign key mapping, and further acquire field information corresponding to the sales staff field to be subjected to foreign key mapping.

The feature extraction unit 420 in this embodiment is configured to perform natural language processing on the field information to obtain a text feature of the field information.

After the field information of the target field is obtained, Natural Language Processing (NLP) needs to be performed on the field information, and the natural language Processing can accurately extract text features in the field information by using methods such as semantic analysis and word segmentation, so as to provide a data base for subsequent model analysis.

For example, a Chinese text is formally a string of Chinese characters (including punctuation marks, etc.). Words can be composed by words, phrases can be composed by words, sentences can be composed by phrases, and further, sentences can be composed by sentences. At whatever level above: the characters (characters), words, phrases, sentences, segments, or the next level to the previous level may have ambiguity and ambiguity phenomena, that is, a segment of character string with the same form can be understood as different word strings, word group strings, etc. and have different meanings in different scenes or different contexts. Similarly, the field information in the database table is also composed of text, which also has different meanings due to different scenes or contexts. Therefore, through accurate analysis of the semantics and the context of the text, more accurate text features can be extracted from the text to represent the true meaning of the text in the scene as much as possible.

The determining unit 430 according to this embodiment of the present application is configured to determine the associated object of the target field according to the associated object classification model and the text feature.

In the embodiment of the application, an associated object classification model is pre-constructed, and based on an existing classification model framework such as a decision tree classification model, a bayesian classification model, an SVM (Support Vector Machine), a KNN (K-Nearest Neighbor, K Neighbor classification algorithm) and the like, the labeled field classification data is trained and tested, so that the associated object classification model is obtained.

The establishing unit 440 of the embodiment of the present application is configured to establish a foreign key mapping relationship between the target field and the database table of the associated object.

After the associated object corresponding to the target field is judged through the classification model, the foreign key mapping relation between the target field and the associated object is established, and then the automatic matching and mapping process of the field and the foreign key in the database table is realized. The process does not need manual intervention, reduces the cost of manual operation, simultaneously avoids the problem of low mapping efficiency and accuracy caused by manually mapping the foreign key of the database table, and improves the efficiency and accuracy of mapping the foreign key of the database table.

In an embodiment of the present application, the feature extraction unit 420 is further configured to: performing word segmentation processing on the field information to obtain a word segmentation result; and extracting a characteristic value in the word segmentation result, and determining the text characteristic of the field information according to the characteristic value.

In an embodiment of the present application, the field information includes a field name, and the feature extraction unit 420 is further configured to: performing word segmentation processing on the field names according to a preset field naming format to obtain a plurality of words; and respectively extracting the characteristic values of the obtained words.

In one embodiment of the application, the target field is a dimension field of a fact table in a Hadoop data warehouse, and the associated object is an entity of a dimension table in the Hadoop data warehouse.

In one embodiment of the present application, the associated object classification model comprises a decision tree classification model.

In an embodiment of the present application, the association objects include a store entity, a project entity and a sales entity, and the determining unit 430 is further configured to: if the associated object of the target field is determined to be an entity, determining whether the entity is the store entity; if the entity is not the store entity, determining whether the entity is the project entity; if the entity is not the project entity, determining whether the entity is the selling entity.

In one embodiment of the present application, the apparatus 400 further comprises: the second acquisition unit is used for acquiring a data test request, and the data test request comprises a field to be tested; the reading unit is used for reading the data in the field to be tested and the foreign key mapping relation of the field to be tested according to the data test request; and the test unit is used for determining a target database table according to the foreign key mapping relation, and if the read data exists in the target database table, the test is passed.

It should be noted that, for the specific implementation of each apparatus embodiment, reference may be made to the specific implementation of the corresponding method embodiment, which is not described herein again.

In summary, according to the technical scheme of the application, field information of a target field is acquired; carrying out natural language processing on the field information to obtain text characteristics of the field information; determining the mode of the associated object of the target field according to the associated object classification model and the text characteristics; and establishing the foreign key mapping relation between the target field and the database table of the associated object, solving the problems of higher cost and low mapping efficiency caused by manually mapping the foreign key of the database table, realizing the automatic matching and mapping process of the foreign key of the database table and improving the efficiency and accuracy of the foreign key mapping.

It should be noted that:

the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the foreign key mapping apparatus of database tables according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

For example, fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 500 comprises a processor 510 and a memory 520 arranged to store computer executable instructions (computer readable program code). The memory 520 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 520 has a storage space 530 storing computer readable program code 531 for performing any of the method steps in the above described method. For example, the storage space 530 for storing the computer readable program code may include respective computer readable program codes 531 for respectively implementing various steps in the above method. The computer readable program code 531 may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as described in fig. 5. FIG. 6 shows a schematic diagram of a computer-readable storage medium according to an embodiment of the present application. The computer readable storage medium 600 has stored thereon a computer readable program code 531 for performing the steps of the method according to the application, readable by the processor 510 of the electronic device 500, which computer readable program code 531, when executed by the electronic device 500, causes the electronic device 500 to perform the steps of the method described above, in particular the computer readable program code 531 stored on the computer readable storage medium may perform the method shown in any of the embodiments described above. The computer readable program code 531 may be compressed in a suitable form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

18页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于人工智能的意图识别的方法、装置、计算机设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!