Method and device for constructing knowledge graph of important infrastructure of internet

文档序号:107491 发布日期:2021-10-15 浏览:21次 中文

阅读说明:本技术 一种互联网重要基础设施知识图谱构建方法与装置 (Method and device for constructing knowledge graph of important infrastructure of internet ) 是由 刘京菊 闫志豪 施凡 薛鹏飞 胡淼 沈毅 于 2021-09-06 设计创作,主要内容包括:本发明提供一种互联网重要基础设施知识图谱构建方法及装置,所述方法包括:构建互联网重要基础设施本体模型,基于所述互联网重要基础设施本体模型,获取所述互联网基础设施的各项结构化、半结构化数据;基于预设提取规则,从所述结构化、半结构化数据中提取知识,构建知识图谱;使用Neo4j图数据库存储所述知识图谱,具有全库导入和增量导入两种导入方式,根据导入场景的不同,使用相应的导入方式,形成互联网重要基础设施知识图谱。根据本发明的方案,实现将多源异构数据和知识整合成为统一的模型进行分析和利用;实现基于知识图谱的展现方式对网络安全态势进行查询和展现。(The invention provides a method and a device for constructing an internet important infrastructure knowledge graph, wherein the method comprises the following steps: constructing an internet important infrastructure ontology model, and acquiring various structured and semi-structured data of the internet infrastructure based on the internet important infrastructure ontology model; extracting knowledge from the structured and semi-structured data based on a preset extraction rule, and constructing a knowledge graph; and storing the knowledge graph by using a Neo4j graph database, wherein the method has two import modes of full-library import and incremental import, and forms the knowledge graph of the important infrastructure of the Internet by using the corresponding import modes according to different import scenes. According to the scheme of the invention, the integration of multi-source heterogeneous data and knowledge into a unified model for analysis and utilization is realized; the network security situation is inquired and displayed based on the display mode of the knowledge graph.)

1. A method for constructing knowledge graph of Internet important infrastructure is characterized by comprising the following steps:

step S101: constructing an internet important infrastructure body model comprising a user role layer, a network service layer, a geographical link layer and a fragile information layer based on the element analysis of the internet important infrastructure; the important Internet infrastructure is an important service set which comprises a domain name system, Web services and network addresses and is used for carrying out Internet information exchange;

step S102: acquiring various data of the important infrastructure of the Internet, and extracting knowledge according to structured and semi-structured data contained in the various data based on a preset extraction rule;

step S103: and storing the knowledge graph by using a Neo4j graph database, wherein the knowledge graph has two import modes of full-library import and incremental import, and the internet information infrastructure knowledge graph is formed by using the corresponding import modes according to different import scenes.

2. The method of constructing an internet infrastructure important knowledge graph according to claim 1, wherein in the network service layer, a node DNS zone is created first, and then a name server and mail server entity node are created for an NS record and an MX record of the DNS zone, and a sub domain name node is created; determining the relation between NS records and MX records of a DNS region by combining the DNS and a real network structure, wherein the DNS region and a sub domain name belong to a control relation; adding a CNMAE record and an outer chain relation between the sub-domain names; and aiming at the sub domain name for starting the Web service, judging possible vulnerabilities according to the version of the software service, and further associating the vulnerabilities with vulnerability nodes through a relationship.

3. The internet critical infrastructure knowledge graph building method of claim 2, wherein the step S102: acquiring various items of data of the important infrastructure of the Internet, and extracting knowledge based on preset extraction rules aiming at structured and semi-structured data contained in the various items of data, wherein the method comprises the following steps:

querying and acquiring records of one million domain names before Alexa ranking, and extracting knowledge of each type of data based on a regular expression to form network service layer knowledge;

sending a query request to a WHOIS database based on a WHOIS command under Linux in a command acquisition mode to acquire WHOIS data of a domain name; the WHOIS data has a Thin mode and a click mode, wherein the WHOIS data formats of all domain names in the click mode are the same; in the Thin mode, the WHOIS data of each domain name does not have a uniform format, and a knowledge extraction rule is written based on a regular expression aiming at the WHOIS data of the Thin mode to extract data; extracting data of the Thin mode by using a model constructed based on a conditional random field; forming user role layer knowledge based on the extracted data;

acquiring AS autonomous domain and geographical position information corresponding to each IP address based on the crawler to form geographical link layer knowledge; the data of the vulnerability database obtained based on the crawler comprises vulnerabilities and relevant information corresponding to the vulnerabilities, a knowledge extraction rule base is formed based on regular expressions, entities and relations relevant to the vulnerabilities are generated, the vulnerabilities which possibly exist are judged through an operating system, a software version and an open port of a domain name, and the relation between a network service layer and a vulnerability information layer is formed.

4. An internet critical infrastructure knowledge graph building apparatus, the apparatus comprising:

a model building module: the method comprises the steps that an internet important infrastructure ontology model comprising a user role layer, a network service layer, a geographical link layer and a fragile information layer is constructed based on element analysis of the internet important infrastructure; the important Internet infrastructure is an important service set which comprises a domain name system, Web services and network addresses and is used for carrying out Internet information exchange;

an extraction module: the method comprises the steps that various items of data of the important infrastructure of the Internet are obtained, and knowledge extraction is carried out on the basis of preset extraction rules aiming at structured and semi-structured data contained in the various items of data;

an importing module: the method is configured to store the knowledge graph by using a Neo4j graph database, has two import modes of full-library import and incremental import, and forms the knowledge graph of the Internet information infrastructure by using the corresponding import modes according to different import scenes.

5. The internet infrastructure important knowledge graph building apparatus of claim 4, wherein in the network service layer, a node DNS zone is created first, and then a name server and mail server entity node are created for the NS record and the MX record of the DNS zone, and a sub domain name node is created; determining the relation between NS records and MX records of a DNS region by combining the DNS and a real network structure, wherein the DNS region and a sub domain name belong to a control relation; adding a CNMAE record and an outer chain relation between the sub-domain names; and aiming at the sub domain name for starting the Web service, judging possible vulnerabilities according to the version of the software service, and further associating the vulnerabilities with vulnerability nodes through a relationship.

6. The internet important infrastructure knowledge graph building apparatus of claim 5, wherein the extraction module queries and obtains records of one million domain names before Alexa ranking, and extracts knowledge of each type of data based on regular expressions to form network service layer knowledge;

sending a query request to a WHOIS database based on a WHOIS command under Linux in a command acquisition mode to acquire WHOIS data of a domain name; the WHOIS data has a Thin mode and a click mode, wherein the WHOIS data formats of all domain names in the click mode are the same; in the Thin mode, the WHOIS data of each domain name does not have a uniform format, and a knowledge extraction rule is written based on a regular expression aiming at the WHOIS data of the Thin mode to extract data; extracting data of the Thin mode by using a model constructed based on a conditional random field; forming user role layer knowledge based on the extracted data;

acquiring AS autonomous domain and geographical position information corresponding to each IP address based on the crawler to form geographical link layer knowledge; the data of the vulnerability database obtained based on the crawler comprises vulnerabilities and relevant information corresponding to the vulnerabilities, a knowledge extraction rule base is formed based on regular expressions, entities and relations relevant to the vulnerabilities are generated, the vulnerabilities which possibly exist are judged through an operating system, a software version and an open port of a domain name, and the relation between a network service layer and a vulnerability information layer is formed.

7. An internet critical infrastructure knowledge graph building system, comprising:

a processor for executing a plurality of instructions;

a memory to store a plurality of instructions;

wherein the plurality of instructions for storing by the memory and loading and executing by the processor the internet critical infrastructure knowledge graph building apparatus of any of claims 1-3.

8. A computer-readable storage medium having stored therein a plurality of instructions; the plurality of instructions for loading and executing by a processor the internet critical infrastructure knowledge graph building apparatus of any one of claims 1-3.

Technical Field

The invention relates to the field of network security, in particular to a method and a device for constructing an internet important infrastructure knowledge graph.

Background

Network security situation awareness does not lack available information, but how to analyze and utilize multi-source heterogeneous data to further grasp security situations and provide support for decision making is an urgent problem to be solved. The massive fragmented multi-source heterogeneous data can be constructed into a whole by aiming at the network space security construction knowledge graph, deep association analysis and mining are carried out, the network security situation is mastered, and a basis is provided for auxiliary decision making.

At present, two methods are mainly used for constructing a network security knowledge graph, one method is to construct a knowledge graph aiming at a local network, for example, 2016 S.Noel et al develops a network situation knowledge graph tool CyGraph based on a Neo4j graph database, and the method is mainly oriented to network action task analysis, visual analysis and knowledge management. Simeonovski et al in 2017 constructed an attribute graph model aiming at the complicated dependency relationship between services and suppliers, the reason of malicious use and the influence possibly caused by attacks, and proposed a threat assessment technology based on taint propagation. In 2018, Giardia et al propose a method for constructing a network security knowledge map and a deduction rule based on a quintuple model, extract entities by using a machine learning method, and then create an ontology, thereby constructing a network security knowledge base. Najafi et al proposed a graph-based MalRank inference algorithm in 2019, which infers the malicious degree of a node according to the relationship between the node and other entities in a constructed knowledge graph. In 2019, kieseling et al constructed key information of various vulnerabilities, weaknesses and attack modes of network space open sources as a knowledge graph.

And the other method is to extract text information related to network space and construct a knowledge graph. For example, in 2019, Qinya et al propose a method for identifying a network security entity by combining a feature template, aiming at the problem that the traditional named entity identification is difficult to identify Chinese and English mixed security entities in the field of network security. Sudip Mittal et al constructed a 'Cyber-All-Intel' system for knowledge extraction, representation and analysis in 2019, proposed a 'VKG' structure in combination with knowledge graph and vector forms, and used a neural network to improve knowledge quality and created a query engine and alarm system.

At present, the field of network security relates to more researches on how to extract relevant security entities such as hosts, personnel, network equipment, vulnerabilities and the like and the relationship among the security entities from massive network security text data, and displays and utilizes the security entities in a mapping mode. The text data mainly comes from various network security forums, blogs, communities, technical reports and the like, and related information of network space structures and situations is relatively deficient.

Disclosure of Invention

In order to solve the technical problems, the invention provides a method and a device for constructing an internet important infrastructure knowledge graph, and the method and the device are used for solving the problem that the related technology and method of the network space knowledge graph in the prior art lack the related knowledge of the network space structure and situation.

According to a first aspect of the present invention, there is provided an internet critical infrastructure knowledge graph construction method, comprising the steps of:

step S101: constructing an internet important infrastructure body model comprising a user role layer, a network service layer, a geographical link layer and a fragile information layer based on the element analysis of the internet important infrastructure; the important Internet infrastructure is an important service set which comprises a domain name system, Web services and network addresses and is used for carrying out Internet information exchange;

step S102: acquiring various data of the important infrastructure of the Internet, and extracting knowledge according to structured and semi-structured data contained in the various data based on a preset extraction rule;

step S103: and storing the knowledge graph by using a Neo4j graph database, wherein the knowledge graph has two import modes of full-library import and incremental import, and the internet information infrastructure knowledge graph is formed by using the corresponding import modes according to different import scenes.

Further, in the network service layer, firstly, a node DNS area is created, then, a name server and a mail server entity node are created according to the NS record and the MX record of the DNS area, and a sub domain name node is created; determining the relation between NS records and MX records of a DNS region by combining the DNS and a real network structure, wherein the DNS region and a sub domain name belong to a control relation; adding a CNMAE record and an outer chain relation between the sub-domain names; and aiming at the sub domain name for starting the Web service, judging possible vulnerabilities according to the version of the software service, and further associating the vulnerabilities with vulnerability nodes through a relationship.

Further, the step S102: acquiring various items of data of the important infrastructure of the Internet, and extracting knowledge based on preset extraction rules aiming at structured and semi-structured data contained in the various items of data, wherein the method comprises the following steps:

querying and acquiring records of one million domain names before Alexa ranking, and extracting knowledge of each type of data based on a regular expression to form network service layer knowledge;

sending a query request to a WHOIS database based on a WHOIS command under Linux in a command acquisition mode to acquire WHOIS data of a domain name; the WHOIS data has a Thin mode and a click mode, wherein the WHOIS data formats of all domain names in the click mode are the same; in the Thin mode, the WHOIS data of each domain name does not have a uniform format, and a knowledge extraction rule is written based on a regular expression aiming at the WHOIS data of the Thin mode to extract data; extracting data of the Thin mode by using a model constructed based on a conditional random field; forming user role layer knowledge based on the extracted data;

acquiring AS autonomous domain and geographical position information corresponding to each IP address based on the crawler to form geographical link layer knowledge; the data of the vulnerability database obtained based on the crawler comprises vulnerabilities and relevant information corresponding to the vulnerabilities, a knowledge extraction rule base is formed based on regular expressions, entities and relations relevant to the vulnerabilities are generated, the vulnerabilities which possibly exist are judged through an operating system, a software version and an open port of a domain name, and the relation between a network service layer and a vulnerability information layer is formed.

According to a second aspect of the present invention, there is provided an internet critical infrastructure knowledge base building apparatus, the apparatus comprising:

a model building module: the method comprises the steps that an internet important infrastructure ontology model comprising a user role layer, a network service layer, a geographical link layer and a fragile information layer is constructed based on element analysis of the internet important infrastructure; the important Internet infrastructure is an important service set which comprises a domain name system, Web services and network addresses and is used for carrying out Internet information exchange;

an extraction module: the method comprises the steps that various items of data of the important infrastructure of the Internet are obtained, and knowledge extraction is carried out on the basis of preset extraction rules aiming at structured and semi-structured data contained in the various items of data;

an importing module: the method is configured to store the knowledge graph by using a Neo4j graph database, has two import modes of full-library import and incremental import, and forms the knowledge graph of the Internet information infrastructure by using the corresponding import modes according to different import scenes.

Further, in the network service layer, firstly, a node DNS area is created, then, a name server and a mail server entity node are created according to the NS record and the MX record of the DNS area, and a sub domain name node is created; determining the relation between NS records and MX records of a DNS region by combining the DNS and a real network structure, wherein the DNS region and a sub domain name belong to a control relation; adding a CNMAE record and an outer chain relation between the sub-domain names; and aiming at the sub domain name for starting the Web service, judging possible vulnerabilities according to the version of the software service, and further associating the vulnerabilities with vulnerability nodes through a relationship.

Further, the extraction module queries and acquires records of one million domain names before Alexa ranking, and extracts knowledge of each type of data based on a regular expression to form network service layer knowledge;

sending a query request to a WHOIS database based on a WHOIS command under Linux in a command acquisition mode to acquire WHOIS data of a domain name; the WHOIS data has a Thin mode and a click mode, wherein the WHOIS data formats of all domain names in the click mode are the same; in the Thin mode, the WHOIS data of each domain name does not have a uniform format, and a knowledge extraction rule is written based on a regular expression aiming at the WHOIS data of the Thin mode to extract data; extracting data of the Thin mode by using a model constructed based on a conditional random field; forming user role layer knowledge based on the extracted data;

acquiring AS autonomous domain and geographical position information corresponding to each IP address based on the crawler to form geographical link layer knowledge; the data of the vulnerability database obtained based on the crawler comprises vulnerabilities and relevant information corresponding to the vulnerabilities, a knowledge extraction rule base is formed based on regular expressions, entities and relations relevant to the vulnerabilities are generated, the vulnerabilities which possibly exist are judged through an operating system, a software version and an open port of a domain name, and the relation between a network service layer and a vulnerability information layer is formed.

According to a third aspect of the present invention, there is provided an internet critical infrastructure knowledge base construction system, comprising:

a processor for executing a plurality of instructions;

a memory to store a plurality of instructions;

wherein the instructions are configured to be stored by the memory and loaded and executed by the processor to perform the method for constructing an internet critical infrastructure knowledge graph as described above.

According to a fourth aspect of the present invention, there is provided a computer readable storage medium having a plurality of instructions stored therein; the plurality of instructions for loading and executing the method for building an internet critical infrastructure knowledge graph as described above by the processor.

According to the scheme, in order to construct the knowledge graph with the capability of representing the network space security situation, relevant data are collected and processed aiming at the important internet infrastructure, the relation between the security entity and the entity is obtained, and the security knowledge graph of the internet information infrastructure is constructed and formed, so that the real network security situation is displayed more intuitively, support is provided for the application of vulnerability influence range analysis, application distribution statistics and the like, and the method has important research value.

The following effects are realized: (1) the data acquisition and knowledge extraction of relevant data of the important infrastructure of the internet can be realized by using the method; (2) the method can integrate the multi-source heterogeneous data and knowledge into a unified model for analysis and utilization; (3) by using the method, the network security situation can be inquired and displayed in a display mode based on the knowledge graph.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention. In the drawings:

FIG. 1 is a flow chart of a method for building an Internet infrastructure important knowledge graph according to an embodiment of the invention;

FIG. 2 is an overall flow diagram of a detailed implementation of the Internet critical infrastructure knowledge graph construction method of one embodiment of the present invention;

FIG. 3 is a schematic diagram of an Internet critical infrastructure ontology according to an embodiment of the present invention;

FIG. 4 is a diagram of knowledge extraction rules in accordance with one embodiment of the present invention;

FIG. 5 is a drawing of an introduction strategy according to an embodiment of the present invention;

fig. 6 is a block diagram illustrating an internet critical infrastructure knowledge graph building apparatus according to an embodiment of the present invention.

Detailed Description

First, a method for constructing an internet infrastructure important knowledge graph according to an embodiment of the present invention will be described with reference to fig. 1. As shown in fig. 1-2, the method comprises the steps of:

step S101: constructing an internet important infrastructure body model comprising a user role layer, a network service layer, a geographical link layer and a fragile information layer based on the element analysis of the internet important infrastructure; the important Internet infrastructure is an important service set which comprises a domain name system, Web services and network addresses and is used for carrying out Internet information exchange;

step S102: acquiring various data of the important infrastructure of the Internet, and extracting knowledge according to structured and semi-structured data contained in the various data based on a preset extraction rule;

step S103: and storing the knowledge graph by using a Neo4j graph database, wherein the knowledge graph has two import modes of full-library import and incremental import, and the internet information infrastructure knowledge graph is formed by using the corresponding import modes according to different import scenes.

The step S101: constructing an internet important infrastructure body model comprising a user role layer, a network service layer, a geographical link layer and a fragile information layer based on the element analysis of the internet important infrastructure; the important infrastructure of the internet is an important service set which comprises a domain name system, Web services and network addresses and is used for exchanging internet information, and the important service set comprises the following components:

the internet infrastructure refers to a collection of hardware and software systems necessary to support the normal operation of the internet, such as transmission lines, routers, domain name services, and the like. In this embodiment, according to the internet measurement data and key features, the DNS, WEB services, IP addresses, geographical locations thereof, and user-related information, which are important components of the internet infrastructure, are abstracted and hierarchically divided, thereby constructing an internet important infrastructure ontology model that is partially consistent with the actual internet, which is shown in fig. 3.

Based on a daily access flow, a site is visited, and first, relevant record information of the site, such as an open port, a service version and the like, is queried through a DNS. Therefore, WEB services and DNS related information are defined as a network service layer, in the network service layer, a node DNS area is firstly created, then entity nodes of a name server and a mail server are created according to NS and MX records of the DNS area, and a sub domain name node is created. Determining the relation between NS records and MX records of a DNS region by combining the DNS and a real network structure, wherein the DNS region and a sub domain name belong to a control relation; and adding the sub-domain name and the CNMAE record and outer chain relation between the sub-domain names. And aiming at the sub domain name for starting the Web service, judging possible vulnerabilities according to the version of the software service, and further associating the vulnerabilities with vulnerability nodes through a relationship.

The IP address is different from the domain name, and has the characteristics of geographic location, autonomous domain, and the like, so that the IP address and the geographic location and autonomous domain thereof are combined to define a geographic link layer. The geographical link layer mainly relates to an IP address node, a geographical position node and an autonomous domain node. Meanwhile, a relation A record is added according to the mapping from the domain name to the record A of the IP address, the geographic position corresponding to the IP address and the autonomous domain are respectively expressed by using the relation, and the relation comprises a source relation and a location relation.

And the fragile information layer is used for evaluating the related vulnerability of the important infrastructure of the Internet and comprises vulnerability nodes, common attack mode nodes, general vulnerability nodes for software security, fragile product nodes and vulnerability scoring nodes.

In this embodiment, element analysis is performed on the internet important infrastructure, relevant data of a WEB service, an IP address and a domain name system are integrated, and a knowledge graph ontology model, namely an internet important infrastructure ontology model, including 16 entities and 16 relations and 86 attributes of a contact telephone, an organization, a mailbox, a name server, a DNS region, a sub-domain name, a mail server, a sub-domain name, an IP address, an AS autonomous domain, a geographic location, a vulnerability score, a vulnerability, a fragile product and an attack mode is constructed by combining various open-source knowledge bases relevant to vulnerabilities, vulnerabilities and attack modes.

The step S102: acquiring various items of data of the important infrastructure of the Internet, and extracting knowledge based on preset extraction rules aiming at structured and semi-structured data contained in the various items of data, wherein the method comprises the following steps:

the structured data are network service layer data acquired based on a DIG program, and geographic link layer and fragile information layer data acquired based on a crawler.

The semi-structured data is user role layer data acquired based on a WHOIS command;

in this embodiment, based on a DIG program, a record of a domain name a, MX, NS, and CNAME before the Alexa rank is queried and acquired, and based on a regular expression, knowledge extraction is performed on each type of data to form network service layer knowledge.

Adding website names, titles, services and firewall attributes to the sub-domain name nodes; because the partial link sub-domain name is not added with other related information, only the link sub-domain name is marked to be used as a sub-domain name node under the outer link information; in order to measure and evaluate the vulnerability of the Web service;

and sending a query request to a WHOIS database based on a WHOIS command under Linux by adopting a command acquisition mode, and acquiring the WHOIS data of the domain name. The WHOIS data has a Thin mode and a click mode, wherein the WHOIS data formats of all domain names in the click mode are the same; in the Thin mode, the WHOIS data of each domain name does not have a uniform format, and a knowledge extraction rule is compiled for the WHOIS data of the Thin mode based on a regular expression for data processing; and extracting data of the Thin mode by using a model constructed based on the conditional random field. And forming the user role layer knowledge.

And acquiring AS autonomous domain and geographical position information corresponding to each IP address based on the crawler to form geographical link layer knowledge. Relevant data of the vulnerability database is obtained based on the crawler, and the relevant data comprises various vulnerabilities and relevant information thereof. And forming a knowledge extraction rule base based on the regular expression to form various vulnerability related entities and relations, and judging possible vulnerabilities through attributes of a domain name operating system, a software version, an open port and the like to further form a relation between a network service layer and a fragile information layer.

In this embodiment, the rule of knowledge extraction is shown in fig. 4.

For data of a domain name system, based on a DIG command, inquiring and acquiring records of A, MX, NS and CNAME of a DNS area, and extracting knowledge of each type of data based on a regular expression;

for the related data of the Web service, more network space asset related databases exist on the Internet at present, and the Web service phase difference data in the embodiment is from a ZoomEye network space radar system and is obtained aiming at the title, the service, the Web application firewall, the IP address and the external link information of a sub-domain name;

adding website names, titles, services and firewall attributes to the sub-domain name nodes; because the partial link sub-domain name is not added with other related information, only the link sub-domain name is marked to be used as a sub-domain name node under the outer link information; in order to measure and evaluate the vulnerability of the Web service, in the embodiment, vulnerability information possibly existing in a domain name is generated according to the acquired service type and version information;

in this embodiment, the domain name WHOIS database includes information related to domain name ownership, including domain name creation time, registrar information, and the like; the information based on domain name ownership can correlate the information of the network space and the social space, and is an important public database of the social information of the network space entity.

For the WHOIS data, there are two main forms of command acquisition and third-party website acquisition. Due to the characteristics of slow update, high charging and non-authority of the third-party website, the embodiment adopts a command acquisition mode, and the query request is sent to the WHOIS database based on the WHOIS command under Linux to acquire the WHOIS data, so that the method is simple and easy to use.

There are two patterns of whis and Thick for whiis data, where the format of the whiis data is the same for all domain names in the Thick pattern, e.g., com and net domains. In the Thin schema, there is no uniform format for the WHOIS data for each domain name, such as the. info and. biz fields. Therefore, a regularized knowledge extraction method cannot be simply employed. In the embodiment, a knowledge extraction rule is written based on a regular expression aiming at WHOIS data in a click mode to perform data processing; data for Thin mode was extracted using a model constructed based on Conditional Random Field (CRF).

An Autonomous System (AS) is a set of routers and IP networks that autonomously determine the internal network protocol, and for each AS, an AS node is created and an IP address is added to the origin _ from relationship of the AS Autonomous domain. And inquiring the AS autonomous domain and the geographic position information corresponding to each IP address through a GeoLite2 database of MaxMind company to form that the geographic position node is connected with the IP address node through the established position relationship and is added into the knowledge graph.

For the geographic location information of the IP addresses, in order to avoid the situation that a plurality of IP addresses are connected with one region, the present embodiment uses the geographic location as a node according to the unique longitude and latitude in the GeoLite2 database; the vulnerability node contains a list of known vulnerabilities in network security, and associates products affected by the vulnerability, vulnerabilities related to the vulnerability, and vulnerability scores for the vulnerability.

The network security is closely related to the vulnerability, the crawler is used in the embodiment to acquire the existing vulnerability data published by the U.S. vulnerability database from 2002 to 2020, vulnerability nodes are added into the knowledge graph, and meanwhile, the vulnerability is associated with the vulnerable product, vulnerability score and vulnerability based on the information of the vulnerability database.

The Common Vulnerability Screening System (CVSS) provides a quantitative model to describe the characteristics and impact of vulnerabilities. In this embodiment, the CVSS scores of all holes are uniformly integrated according to the standard 2.0 and the standard 3, the CVSS is used as the score of a hole and is an attribute of the hole, in order to highlight the relevance of different important degrees of the hole, the CVSS basic score of the hole and the corresponding feature vector are combined to be used as a hole score node, and meanwhile, the version attributes are used for distinguishing different CVSS standards.

For each fragile product node, an affected product is represented, and information such as product name, version, update, version, language and the like is contained. The embodiment aims to highlight the vulnerability situations of different software of different suppliers.

Common Weaknesses Enumeration (CWE) for software security is a list of software security vulnerabilities that contains information about detection, mitigation, and defense, thus creating vulnerability nodes associated with vulnerability nodes. And creating a CAPEC node based on Common Attack Pattern Enumeration and Classification (CAPEC), wherein the CAPEC node is associated with the vulnerability node type.

The final extracted knowledge quantity of the embodiment reaches the ten-million level, and the knowledge graph is instantiated on the relationship between the extracted 380 ten-thousand nodes and 1090 ten thousand nodes.

In the embodiment, different types of data are collected and acquired in a targeted manner, and a network service layer, a user information layer, a geographical link layer and a fragile information layer are respectively defined according to data levels. And then corresponding knowledge is extracted from the acquired structured and semi-structured data.

The step S103: using a Neo4j graph database to store the knowledge graph, wherein the knowledge graph has two import modes of full-library import and incremental import, and the internet information infrastructure knowledge graph is formed by using the corresponding import modes according to different import scenes, wherein:

the internet information infrastructure knowledge graph is stored by a NoSQL database Neo4j, and a data model realized based on Neo4j has flexible and variable characteristics. The graph pattern matching query may be represented in the query language (Cypher) possessed by Neo4 j. The graph data represents the entities through the nodes, and the edges represent the relations among the entities, so that expensive connection operation or other index lookup for the graph data traversal is avoided. In Neo4j, the graph traversal speed depends only on the size of the query result actually traversed, regardless of the total size of the graph.

The Data import of the embodiment is divided into two parts, when Data is imported for the first time, the Data is imported in a batch mode by using a Neo4j-import tool provided by an official party in a CSV file mode, but because the Neo4j-import tool does not support incremental import and the Neo4j must be stopped, the incremental import is subsequently carried out by using Spring Data Neo4j (SDN) for convenience of Data utilization and subsequent maintenance.

Based on Spring Data Neo4j, the graphic database Neo4j can be accessed through configuration, and three different abstraction levels of Neo4j client, Neo4j Template and Neo4j priorities are provided for accessing storage. Spring Data Neo4j provides the property of mapping annotated entity classes to the Neo4j graphical database. As shown in the SDN architecture of fig. 5, SDN can connect databases through three ways of drivers, namely Bolt protocol, HTTP protocol and using an official Neo4j Java embedded driver. Spring Data Neo4j provides code on top of Neo4-OGM, helping to quickly build Spring-based Neo4j applications.

In the embodiment, a multi-mode knowledge import scheme design is completed by constructing an internet important infrastructure body and designing a data acquisition and knowledge extraction framework, and a graph database Neo4j is used for storage, so that an internet important infrastructure knowledge graph is finally constructed and formed. And forming a unified model by the multi-source heterogeneous data related to safety through a knowledge graph for analysis and utilization, and further displaying the real network safety situation more intuitively.

An embodiment of the present invention further provides an apparatus for constructing an internet important infrastructure knowledge graph, as shown in fig. 6, the apparatus includes:

a model building module: the method comprises the steps that an internet important infrastructure ontology model comprising a user role layer, a network service layer, a geographical link layer and a fragile information layer is constructed based on element analysis of the internet important infrastructure; the important Internet infrastructure is an important service set which comprises a domain name system, Web services and network addresses and is used for carrying out Internet information exchange;

an extraction module: the method comprises the steps that various items of data of the important infrastructure of the Internet are obtained, and knowledge extraction is carried out on the basis of preset extraction rules aiming at structured and semi-structured data contained in the various items of data;

an importing module: the method is configured to store the knowledge graph by using a Neo4j graph database, has two import modes of full-library import and incremental import, and forms the knowledge graph of the Internet information infrastructure by using the corresponding import modes according to different import scenes.

The embodiment of the invention further provides an internet important infrastructure knowledge graph construction system, which comprises the following steps:

a processor for executing a plurality of instructions;

a memory to store a plurality of instructions;

wherein the instructions are configured to be stored by the memory and loaded and executed by the processor to perform the method for constructing an internet critical infrastructure knowledge graph as described above.

The embodiment of the invention further provides a computer readable storage medium, wherein a plurality of instructions are stored in the storage medium; the plurality of instructions for loading and executing the method for building an internet critical infrastructure knowledge graph as described above by the processor.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a physical machine Server, or a network cloud Server, etc., and needs to install a Windows or Windows Server operating system) to perform some steps of the method according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any simple modification, equivalent change and modification made to the above embodiment according to the technical spirit of the present invention are still within the scope of the technical solution of the present invention.

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:CAN通信安全的检测方法、装置、电子设备及车辆

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类