Method and system for establishing knowledge graph of policy text

文档序号:830110 发布日期:2021-03-30 浏览:32次 中文

阅读说明:本技术 政策文本的知识图谱构建方法及系统 (Method and system for establishing knowledge graph of policy text ) 是由 孙璐 李向前 刘巍 曹扬 王鹏 王晖 巢文涵 郝雅琦 张金言 于 2020-12-15 设计创作,主要内容包括:本发明实施例提供一种政策文本的知识图谱构建方法及系统。方法包括:获取待构建知识图谱的政策文本;基于模式匹配技术对政策文本进行处理,得到政策文本的框架信息;基于深度学习技术对政策文本进行处理,得到政策文本的属性信息;基于句法分析技术对政策文本中的每一单句进行处理,得到每一单句的实体关系信息;基于政策文本的框架信息、政策文本的属性信息以及政策文本中每一单句的实体关系信息,构建政策文本的知识图谱。本发明实施例提供的方法及系统,通过基于模式匹配的政策文本框架提取、基于神经网络的政策文本属性信息提取和基于句法分析的政策文本实体关系抽取,能够完成对政策文本的解析,从而构建得到政策文本对应的知识图谱。(The embodiment of the invention provides a method and a system for constructing a knowledge graph of a policy text. The method comprises the following steps: acquiring a policy text of a knowledge graph to be constructed; processing the policy text based on a pattern matching technology to obtain frame information of the policy text; processing the policy text based on a deep learning technology to obtain attribute information of the policy text; processing each single sentence in the policy text based on a syntactic analysis technology to obtain entity relation information of each single sentence; and constructing a knowledge graph of the policy text based on the frame information of the policy text, the attribute information of the policy text and the entity relationship information of each single sentence in the policy text. According to the method and the system provided by the embodiment of the invention, the analysis of the policy text can be completed through the extraction of the policy text frame based on the mode matching, the extraction of the attribute information of the policy text based on the neural network and the extraction of the entity relationship of the policy text based on the syntactic analysis, so that the knowledge graph corresponding to the policy text is constructed and obtained.)

1. A method for constructing a knowledge graph of a policy text is characterized by comprising the following steps:

acquiring a policy text of a knowledge graph to be constructed;

processing the policy text based on a pattern matching technology to obtain frame information of the policy text;

processing the policy text based on a deep learning technology to obtain attribute information of the policy text;

processing each single sentence in the policy text based on a syntactic analysis technology to obtain entity relationship information of each single sentence;

and constructing a knowledge graph of the policy text based on the frame information of the policy text, the attribute information of the policy text and the entity relationship information of each single sentence in the policy text.

2. The method of claim 1, wherein the policy text comprises a catalog, and the processing of the policy text based on a pattern matching technique to obtain the frame information of the policy text comprises:

constructing a plurality of regular expressions;

and matching each regular expression with the catalogue of the policy text, if the regular expressions are matched with the catalogue of the policy text, acquiring a matching result, and taking all the acquired matching results as the frame information of the policy text.

3. The method of claim 2, wherein the framework information of the policy document includes a plurality of titles in a directory of the policy document, and the plurality of titles include any one or more of guiding ideas, development principles, development targets, key tasks, major projects, and safeguards.

4. The method for constructing a knowledge graph of a policy text according to claim 1, wherein the policy text includes a preamble, and the processing of the policy text based on a deep learning technique to obtain the attribute information of the policy text comprises:

and inputting the introduction of the policy text into the trained neural network model, obtaining an output result of the neural network model, and taking the output result as attribute information of the policy text.

5. The method of claim 4 wherein the neural network model is a TENER model.

6. The method of claim 4, wherein the attribute information of the policy text includes any one or more of a place of release, a release agency, and a time of release in the introduction of the policy text.

7. The method for constructing a knowledge graph of a policy text according to claim 1, wherein each single sentence in the policy text is processed based on a syntactic analysis technique to obtain entity relationship information of each single sentence, and the method comprises:

acquiring a plurality of single sentences in the policy text;

and for each single sentence, analyzing the single sentence based on a syntactic tree analysis method to extract verbs and corresponding nouns in the single sentence, taking the verbs as relations, and taking the nouns as entities.

8. A system for knowledge graph construction of policy text, comprising:

the policy text acquisition module is used for acquiring a policy text of the knowledge graph to be constructed;

the frame information acquisition module is used for processing the policy text based on a pattern matching technology to obtain frame information of the policy text;

the attribute information acquisition module is used for processing the policy text based on a deep learning technology to obtain attribute information of the policy text;

the entity relationship information acquisition module is used for processing each single sentence in the policy text based on a syntactic analysis technology to obtain entity relationship information of each single sentence;

and the knowledge graph construction module is used for constructing a knowledge graph of the policy text based on the frame information of the policy text, the attribute information of the policy text and the entity relationship information of each single sentence in the policy text.

9. An electronic device comprising a memory and a processor; wherein the memory has stored therein a computer program; the processor for executing the computer program to implement the method of knowledge-graph construction of policy text according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, implements the method of constructing a knowledge graph of policy text according to any one of claims 1 to 7.

Technical Field

The invention relates to the technical field of computers, in particular to a method and a system for constructing a knowledge graph of a policy text.

Background

The concept of knowledge graph was born in 2012. In order to systematize knowledge and information, a user can obtain a complete related knowledge system through any keyword, and the search quality is improved, Google proposes the concept of a knowledge graph and constructs an initial knowledge graph. Since the knowledge graph can accurately reflect real world facts and can well express abstract knowledge of concepts, hierarchies, and the like, in recent years, the knowledge graph has been applied to a plurality of fields and a great deal of research has been conducted around it.

The vision of the knowledge-graph research field is to construct a structured knowledge base, which serves the aspects of the artificial intelligence field. The knowledge map can be used as an artificial intelligence ecological infrastructure, so that the data acquisition efficiency is improved, the knowledge application threshold is reduced, and the knowledge working efficiency is improved. The knowledge graph can be used as a knowledge engine in the big data era, and structured knowledge can be obtained from the latest information source in time. The knowledge map can be used as a machine intelligence brain in the artificial intelligence era to enable a machine to understand the background knowledge of human society.

Two important technologies are involved in the knowledge graph, namely named entity identification technology and relationship extraction technology. In reality, the knowledge graph is applied to many scenes, such as person name and place name recognition, but a complete framework is not built in the policy field, and the existing policy knowledge graph is constructed based on macroscopically the relationship between policies, but lacks fine-grained information analysis.

The traditional knowledge graph comprises specific entities and relations, and the entities and relations in the traditional knowledge graph are extracted by analyzing text contents. However, for the policy text, it is difficult for the policy text to define the specific entities, attributes and relationships in the text, the policy text is mainly composed of a bingo structure, and overall, there is a discussion framework, and a knowledge graph construction scheme needs to be made for a specific field of policy.

Disclosure of Invention

Aiming at the problems in the prior art, the embodiment of the invention provides a method and a system for constructing a knowledge graph of a policy text.

In a first aspect, an embodiment of the present invention provides a method for constructing a knowledge graph of a policy text, including:

acquiring a policy text of a knowledge graph to be constructed;

processing the policy text based on a pattern matching technology to obtain frame information of the policy text;

processing the policy text based on a deep learning technology to obtain attribute information of the policy text;

processing each single sentence in the policy text based on a syntactic analysis technology to obtain entity relationship information of each single sentence;

and constructing a knowledge graph of the policy text based on the frame information of the policy text, the attribute information of the policy text and the entity relationship information of each single sentence in the policy text.

Further, the policy text includes a catalog, and the processing of the policy text based on a pattern matching technique to obtain the frame information of the policy text includes:

constructing a plurality of regular expressions;

and matching each regular expression with the catalogue of the policy text, if the regular expressions are matched with the catalogue of the policy text, acquiring a matching result, and taking all the acquired matching results as the frame information of the policy text.

Further, the frame information of the policy text comprises a plurality of titles in the catalogue of the policy text, wherein the plurality of titles comprise any one or more of guiding ideas, development principles, development targets, key tasks, major projects and safeguard measures.

Further, the policy text includes a preamble, and the processing of the policy text based on a deep learning technique to obtain attribute information of the policy text includes:

and inputting the introduction of the policy text into the trained neural network model, obtaining an output result of the neural network model, and taking the output result as attribute information of the policy text.

Further, the neural network model is a TENER model.

Further, the attribute information of the policy text includes any one or a combination of a place of publication, a publishing organization, and a publishing time in the preamble of the policy text.

Further, processing each single sentence in the policy text based on a syntactic analysis technology to obtain entity relationship information of each single sentence, including:

acquiring a plurality of single sentences in the policy text;

and for each single sentence, analyzing the single sentence based on a syntactic tree analysis method to extract verbs and corresponding nouns in the single sentence, taking the verbs as relations, and taking the nouns as entities.

In a second aspect, an embodiment of the present invention provides a knowledge graph building system for a policy text, including:

the policy text acquisition module is used for acquiring a policy text of the knowledge graph to be constructed;

the frame information acquisition module is used for processing the policy text based on a pattern matching technology to obtain frame information of the policy text;

the attribute information acquisition module is used for processing the policy text based on a deep learning technology to obtain attribute information of the policy text;

the entity relationship information acquisition module is used for processing each single sentence in the policy text based on a syntactic analysis technology to obtain entity relationship information of each single sentence;

and the knowledge graph construction module is used for constructing a knowledge graph of the policy text based on the frame information of the policy text, the attribute information of the policy text and the entity relationship information of each single sentence in the policy text.

In a third aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor; wherein the memory has stored therein a computer program; the processor is configured to execute the computer program to implement the method for constructing a knowledge graph of policy text as described above.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for constructing a knowledge graph of policy texts as described above.

According to the method and the system for constructing the knowledge graph of the policy text, which are provided by the embodiment of the invention, the analysis of the policy text can be completed through the extraction of a policy text frame based on pattern matching, the extraction of the attribute information of the policy text based on a neural network and the extraction of the entity relationship of the policy text based on syntactic analysis, so that the knowledge graph corresponding to the policy text is constructed and obtained.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart of a method for constructing a knowledge graph of a policy document according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating parsing of a single sentence according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a knowledge graph of a policy document according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a knowledge graph construction system of a policy text according to an embodiment of the present invention;

fig. 5 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a method for constructing a knowledge graph of a policy document according to an embodiment of the present invention, and as shown in fig. 1, the method for constructing the knowledge graph includes:

step 101, obtaining a policy text of a knowledge graph to be constructed.

Particularly, with the rapid development of the application of computers such as the internet, artificial intelligence, big data and the like in various industries, a great number of policies concerning the computer fields such as the internet, artificial intelligence, big data and the like are developed endlessly to standardize and promote the development of the computer fields such as the internet, artificial intelligence, big data and the like. The policy text of the knowledge graph to be constructed in the embodiment of the invention refers to the policy text in the computer fields of Internet, artificial intelligence, big data and the like.

And 102, processing the policy text based on a pattern matching technology to obtain frame information of the policy text.

Specifically, pattern matching is a basic operation of character strings in a data structure, that is, given a substring, all substrings identical to the substring are required to be found in a certain character string, which is pattern matching. And performing pattern matching on the policy text to obtain the frame information of the policy text. The frame information specifically indicates the title of each section of the policy document.

And 103, processing the policy text based on a deep learning technology to obtain attribute information of the policy text.

In particular, deep learning is to learn the intrinsic regularity and expression hierarchy of sample data, and information obtained in these learning processes is very helpful to the interpretation of data such as text, images, and sounds. The final aim of the method is to enable the machine to have the analysis and learning capability like a human, and to recognize data such as characters, images and sounds. And constructing a neural network model by a deep learning technology, training the neural network model by using a training set, and inputting the policy text into the trained neural network model to obtain attribute information of the policy text. The attribute information specifically indicates the time of release of the policy document, the issuing organization, the time of release, and the like.

And 104, processing each single sentence in the policy text based on a syntactic analysis technology to obtain entity relationship information of each single sentence.

Specifically, syntactic analysis refers to the analysis of the grammatical function of words in a sentence. And processing each single sentence in the policy text through syntactic analysis, so as to obtain the entity relationship information of each single sentence. It should be noted that the entity refers to a verb in a single sentence, and the relationship refers to a noun corresponding to the verb in the single sentence.

And 105, constructing a knowledge graph of the policy text based on the frame information of the policy text, the attribute information of the policy text and the entity relationship information of each single sentence in the policy text.

According to the method for constructing the knowledge graph of the policy text, which is provided by the embodiment of the invention, the analysis of the policy text can be completed through the extraction of a policy text frame based on pattern matching, the extraction of the attribute information of the policy text based on a neural network and the extraction of the entity relationship of the policy text based on syntactic analysis, so that the knowledge graph corresponding to the policy text is constructed and obtained.

In some embodiments, the policy text includes a catalog, and the processing of the policy text based on a pattern matching technique to obtain the frame information of the policy text includes:

and constructing a plurality of regular expressions.

Specifically, the regular expression is a logical formula for operating on a character string, that is, a "regular character string" is formed by using specific characters defined in advance and a combination of the specific characters, and the "regular character string" is used for expressing a filtering logic for the character string.

And matching each regular expression with the catalogue of the policy text, if the regular expressions are matched with the catalogue of the policy text, acquiring a matching result, and taking all the acquired matching results as the frame information of the policy text.

Specifically, it is known by looking through a large number of policy texts, the policy texts are composed of sections, each section can express different contents of the policy text, the contents of each section are summarized by titles, and the titles are all collectively recorded in a catalog of the policy text. Therefore, a plurality of regular expressions matched with the titles are set, each regular expression is matched with the directory in the policy text, the titles are matched from the directory in the policy text, and all the matched titles are used as the frame information of the policy text.

In some embodiments, the framework information of the policy text includes a number of titles in a directory of the policy text, the number of titles including any one or more of a guiding idea, a development principle, a development goal, a mission critical, a major project, and a safeguard measure.

In some embodiments, the policy text includes a preamble, and the processing of the policy text based on a deep learning technique to obtain the attribute information of the policy text includes:

and inputting the introduction of the policy text into the trained neural network model, obtaining an output result of the neural network model, and taking the output result as attribute information of the policy text.

Specifically, the neural network model in the embodiment of the present invention is preferably a TENER model, and the TENER model is improved for a Named Entity Recognition (NER) task on the basis of a transform model, and may be applied to both word-level and character-level. The TENER model achieves the same good results in NER among other tasks by using a location-aware coding approach and a clipped attention. Since the structure of the TENER model is described in the prior art, it is not described here in detail.

In some embodiments, the attribute information of the policy text includes any one or a combination of a place of publication, a publishing institution and a publishing time in the preamble of the policy text.

In some embodiments, processing each single sentence in the policy text based on a syntactic analysis technique to obtain entity relationship information of each single sentence includes:

and acquiring a plurality of single sentences in the policy text.

And for each single sentence, analyzing the single sentence based on a syntactic tree analysis method to extract verbs and corresponding nouns in the single sentence, taking the verbs as relations, and taking the nouns as entities.

Specifically, it can be known from a large number of policy texts that most of the single sentences in the policy texts are structures formed by combining verbs and nouns, and the verbs and the nouns are important parts in normal texts. Therefore, the embodiment of the invention provides an entity identification and relation extraction algorithm based on syntactic analysis so as to carry out fine-grained analysis on the policy text.

Aiming at each single sentence of the policy text, a syntactic tree analysis method is adopted to analyze the structure of the single sentence so as to identify the verb relationship in the single sentence, verbs and corresponding nouns in the single sentence are extracted and respectively used as the relationship and the entity, and components between the verbs and the nouns are used as the attributes for modifying the entity. Referring to fig. 2, the single sentence "establish" in "establish public data resource open shared list" in the policy text is a relationship.

As a preferred embodiment, a knowledge graph constructed by the method for constructing a knowledge graph of a policy text according to the embodiment of the present invention on the policy text ". cndot.information infrastructure planning guidance opinion" is shown in fig. 3. It should be noted that fig. 3 only illustrates a part of the content of the knowledge-graph due to space problems.

Fig. 4 is a schematic structural diagram of a knowledge graph construction system of a policy document according to an embodiment of the present invention, and as shown in fig. 4, the system includes:

a policy text acquisition module 401, configured to acquire a policy text of a knowledge graph to be constructed; a frame information obtaining module 402, configured to process the policy text based on a pattern matching technique to obtain frame information of the policy text; an attribute information obtaining module 403, configured to process the policy text based on a deep learning technique to obtain attribute information of the policy text; an entity relationship information obtaining module 404, configured to process each single sentence in the policy text based on a syntactic analysis technique to obtain entity relationship information of each single sentence; a knowledge graph constructing module 405, configured to construct a knowledge graph of the policy text based on the frame information of the policy text, the attribute information of the policy text, and the entity relationship information of each single sentence in the policy text.

Specifically, the system provided in the embodiment of the present invention is specifically configured to execute the embodiment of the method for constructing a knowledge graph of a policy text, and details thereof are not repeated in the embodiment of the present invention. According to the system provided by the embodiment of the invention, the analysis of the policy text can be completed through the extraction of the policy text frame based on the mode matching, the extraction of the attribute information of the policy text based on the neural network and the extraction of the entity relationship of the policy text based on the syntactic analysis, so that the knowledge graph corresponding to the policy text is constructed and obtained.

Fig. 5 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 5, the electronic device may include: a processor (processor)501, a communication Interface (Communications Interface)502, a memory (memory)503, and a communication bus 504, wherein the processor 501, the communication Interface 502, and the memory 503 are configured to communicate with each other via the communication bus 504. The processor 501 may invoke a computer program stored on the memory 503 and executable on the processor 501 to perform the methods provided by the above embodiments, including, for example: acquiring a policy text of a knowledge graph to be constructed; processing the policy text based on a pattern matching technology to obtain frame information of the policy text; processing the policy text based on a deep learning technology to obtain attribute information of the policy text; processing each single sentence in the policy text based on a syntactic analysis technology to obtain entity relationship information of each single sentence; and constructing a knowledge graph of the policy text based on the frame information of the policy text, the attribute information of the policy text and the entity relationship information of each single sentence in the policy text.

In addition, the logic instructions in the memory 503 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the method provided in the foregoing embodiments when executed by a processor, and the method includes: acquiring a policy text of a knowledge graph to be constructed; processing the policy text based on a pattern matching technology to obtain frame information of the policy text; processing the policy text based on a deep learning technology to obtain attribute information of the policy text; processing each single sentence in the policy text based on a syntactic analysis technology to obtain entity relationship information of each single sentence; and constructing a knowledge graph of the policy text based on the frame information of the policy text, the attribute information of the policy text and the entity relationship information of each single sentence in the policy text.

The above-described system embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于标签分层延深建模的企业画像方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!