Method and system for extracting identification information of networked intelligent equipment

文档序号:1116114 发布日期:2020-09-29 浏览:8次 中文

阅读说明:本技术 一种联网智能设备识别信息提取方法及系统 (Method and system for extracting identification information of networked intelligent equipment ) 是由 张淼 徐国爱 吕浩 徐国胜 郭燕慧 王浩宇 于 2020-05-14 设计创作,主要内容包括:本发明提供一种联网智能设备识别信息提取方法及系统,涉及物联网设备安全技术领域,该方法包括从应用层响应数据中过滤得到联网智能设备的应用层响应数据并从中提取标识联网智能设备特性的特征关键词序列;再搜索特征关键词序列并爬取前n条网页;从网页中过滤得到第一文本信息并基于隐马尔科夫模型的命名实体识别算法在第一文本信息中提取初步设备描述识别信息;在初步设备描述识别信息的同一类别中选择出现频次最高的信息作为该类别的最终设备描述识别信息;该提取方法能够提取未见过的设备描述识别信息;而且最终设备描述识别信息的确定方法能够达到与现有关联规则挖掘算法相同的准确率,并且算法简单,计算资源远远小于其他现有算法。(The invention provides a method and a system for extracting identification information of networked intelligent equipment, which relate to the technical field of Internet of things equipment safety, and the method comprises the steps of filtering application layer response data of the networked intelligent equipment from the application layer response data and extracting a feature keyword sequence for identifying the characteristics of the networked intelligent equipment; searching the characteristic keyword sequence and crawling the first n webpages; filtering the webpage to obtain first text information, and extracting preliminary equipment description identification information from the first text information based on a named entity identification algorithm of a hidden Markov model; selecting the information with the highest frequency of occurrence from the same category of the preliminary device description identification information as the final device description identification information of the category; the extraction method can extract unseen equipment description identification information; and the final method for determining the equipment description identification information can achieve the same accuracy rate as the existing association rule mining algorithm, the algorithm is simple, and the computing resources are far smaller than other existing algorithms.)

1. A method for extracting identification information of networked intelligent equipment is characterized by comprising the following steps:

filtering the application layer response data to obtain the application layer response data of the networked intelligent equipment;

extracting a feature keyword sequence for identifying the characteristics of the networked intelligent equipment from application layer response data of the networked intelligent equipment;

searching the characteristic keyword sequence in a search engine, and crawling the first n corresponding webpages in the search result;

filtering the webpage to obtain first text information, and extracting preliminary equipment description identification information of the networked intelligent equipment from the first text information based on a named entity identification algorithm of a hidden Markov model;

selecting the information with the highest frequency of occurrence in the same category of the preliminary equipment description identification information as the final equipment description identification information of the category of the networked intelligent equipment;

and searching whether the vulnerability identification information of the networked intelligent equipment exists in a CVE (virtual content environment) vulnerability database according to the final equipment description identification information, and if so, extracting the vulnerability identification information from a search result.

2. The extraction method of claim 1, wherein the filtering the application layer response data of the networked smart device from the application layer response data comprises:

filtering application layer response data and error response information of non-Internet-of-things equipment in the application layer response data;

the application layer response data of the non-Internet-of-things equipment comprises application layer response data of a heavyweight Web server;

the error response information is information of which the state codes of the HTTP responses are 4xx and 5 xx.

3. The extraction method according to claim 1, wherein the extracting a sequence of feature keywords identifying characteristics of networked smart devices from application layer response data of the networked smart devices comprises:

filtering application layer response data of the networked intelligent equipment to obtain second text information, and extracting a characteristic keyword sequence for identifying the characteristics of the networked intelligent equipment from the second text information by adopting a word frequency-inverse document frequency algorithm on the basis of a device response information corpus of the internet of things;

the Internet of things equipment response information corpus at least comprises equipment types, equipment manufacturers and equipment models of the Internet of things equipment.

4. The extraction method according to claim 3, wherein the filtering of the second text information from the application layer response data of the networked smart device comprises:

based on application layer response data of the networked intelligent equipment of the HTTP protocol, filtering out HTML labels, punctuation marks, non-numeric non-character characters and hyperlink contents by adopting a regular expression and a Beautiful Soup of a third-party library of Python, and reserving texts to obtain second text information;

or, the application layer response data of the networked intelligent device based on the FTP protocol adopts a regular expression to filter out punctuation marks, non-numeric non-literal characters and hyperlink contents, and retains texts, thereby obtaining the second text information.

5. The extraction method according to claim 1, wherein the filtering the first text information from the web page comprises:

and filtering out HTML (hypertext markup language) labels, punctuation marks, non-numeric non-character characters and hyperlink contents in the webpage by adopting a regular expression and a third-party library Beautiful Soup of Python, and reserving texts to obtain the first text information.

6. The extraction method according to claim 1, wherein the preliminary device description identification information and the final device description identification information each include a device type, a device model, and a device manufacturer.

7. The utility model provides a networking smart machine identification information draws frame system which characterized in that includes: the system comprises a data processing module, a searching/processing module, a management module and a front-end display module;

the data processing module comprises a filtering module and a preprocessing module; the filtering module is used for filtering the application layer response data to obtain the application layer response data of the networked intelligent equipment; the preprocessing module is used for extracting a characteristic keyword sequence for identifying the characteristics of the networked intelligent equipment from application layer response data of the networked intelligent equipment;

the searching/processing module comprises a searching module and a processing module, and the searching module is used for searching the characteristic keyword sequence in a search engine and crawling the first n corresponding webpages in the search result; the processing module is used for filtering the webpage to obtain first text information;

the management module comprises an extraction module, a storage module and a query module;

the extraction module is used for extracting preliminary equipment description identification information of the networked intelligent equipment from the first text information by adopting a named entity identification algorithm based on a hidden Markov model, selecting information with the highest frequency of occurrence from the same category of the preliminary equipment description identification information as final equipment description identification information of the category of the networked intelligent equipment, searching whether vulnerability identification information of the networked intelligent equipment exists in a CVE (composite video environment) vulnerability database according to the final equipment description identification information, and if the vulnerability identification information exists, extracting the vulnerability identification information from a search result;

the storage module is used for determining whether the final equipment description identification information and the vulnerability identification information are stored, and if not, the final equipment description identification information and the vulnerability identification information are stored;

the query module is used for querying the identification information of the networked intelligent equipment in the storage module;

the front-end display module is used for interacting with the processing module, the searching/processing module and the management module.

8. The abstraction framework system of claim 7, wherein said querying the storage module for identifying information of networked smart devices comprises:

and inquiring the identification information of the networked intelligent equipment in the storage module according to the equipment type, the equipment manufacturer, the equipment model or the equipment vulnerability number.

9. The extraction framework system of claim 7, wherein the front-end presentation module comprises a keyword input module, an application layer response information input module, and an application layer response information acquisition module; the keyword input module is interacted with the query module and used for querying the identification information of the networked intelligent equipment in the storage module by inputting keywords;

the application layer response information input module is used for inputting response data of an application layer and processing the input application layer response data through the data processing module, the searching/processing module and the management module;

the application layer response information acquisition module is used for acquiring application layer response information according to an input IP (Internet protocol), a port number or a protocol and processing the acquired application layer response data through the data processing module, the search/processing module and the management module.

10. The abstraction framework system of claim 7, wherein said filtering application layer response data of networked smart devices from application layer response data comprises:

filtering application layer response data and error response information of non-Internet-of-things equipment in the application layer response data;

the application layer response data of the non-Internet-of-things equipment comprises application layer response data of a heavyweight Web server;

the error response information is information with state codes of 4xx and 5xx of HTTP response;

preferably, said extracting a sequence of feature keywords identifying characteristics of networked smart devices from application layer response data of said networked smart devices comprises:

filtering application layer response data of the networked intelligent equipment to obtain second text information, and extracting a feature keyword sequence for identifying the characteristics of the networked intelligent equipment from the second text information by adopting a word frequency-inverse document frequency algorithm on the basis of an Internet of things equipment response information corpus, wherein the Internet of things equipment response information corpus at least comprises equipment types, equipment manufacturers and equipment models of the Internet of things equipment;

preferably, the filtering of the second text information from the application layer response data of the networked smart device includes:

based on application layer response data of the networked intelligent equipment of the HTTP protocol, filtering out HTML labels, punctuation marks, non-numeric non-character characters and hyperlink contents by adopting a regular expression and a Beautiful Soup of a third-party library of Python, and reserving texts to obtain second text information;

or, the application layer response data of the networked intelligent device based on the FTP protocol adopts a regular expression to filter out punctuation marks, non-numeric non-literal characters and hyperlink contents, and retains texts, thereby obtaining the second text information.

Preferably, the filtering of the first text information from the web page includes:

filtering out HTML (hypertext markup language) labels, punctuation marks, non-numeric non-character characters and hyperlink contents in the webpage by adopting a regular expression and a third-party library Beautiful Soup of Python, and reserving texts to obtain first text information;

preferably, the preliminary device description identification information and the final device description identification information each include a device type, a device model, and a device manufacturer.

Technical Field

The invention relates to the technical field of Internet of things equipment safety, in particular to a method and a system for extracting identification information of networked intelligent equipment.

Background

A certain amount of research is carried out in the field of extracting identification information of networked intelligent equipment at home and abroad, and some feasible methods for extracting the identification information of the networked intelligent equipment are provided; the existing method for extracting the identification information of the networked intelligent equipment can be divided into two types: one is based on supervised machine learning techniques and the other is based on natural language processing and data mining.

Networked intelligent equipment identification information extraction method based on supervised machine learning technology

Most of the existing methods for extracting the identification information of the networked intelligent equipment utilize supervised learning in machine learning; collecting network traffic of a plurality of types of Internet of things equipment in advance, training a machine learning model by using characteristics extracted from each layer such as a link layer, a network layer, a transmission layer, an application layer and the like in a network traffic packet, and predicting the type of the Internet of things equipment; however, these methods can only predict the type level of the device, cannot predict finer grained device information, and the set of device types that can be predicted depends on the pre-collected device types, and the collection and labeling of data sets requires a lot of human involvement.

Networking intelligent equipment identification information extraction method based on natural language processing and data mining

Xuan Feng et al put forward an automatic networking intelligent equipment labeling framework ARE for the first time; the framework can automatically extract (type, manufacturer, model) information of the networked intelligent device; the system collects application layer response data of 4 protocols such as HTTP, FTP, RTSP, TELNET and the like from Censys, and extracts identification information of the networked intelligent equipment by using technologies such as natural language processing, data mining and the like; the method for extracting the information such as the equipment type, the equipment manufacturer, the equipment model and the like in the equipment description webpage completely depends on the rule matching and the rule base, so that the extraction performance greatly depends on the quality of the rule and the perfection of the rule base.

Disclosure of Invention

In view of the above, the present invention provides a method and a system for extracting identification information of networked intelligent devices, so as to solve the problems that the device identification information cannot be extracted when the existing rule is not matched with the device type, manufacturer, and model in the existing method for extracting identification information of networked intelligent devices, and the problems that the extraction of the device identification information in the search result based on the association rule mining algorithm takes a long time and the calculation resources are high.

Based on the above purpose, the first aspect of the present invention provides a method for extracting identification information of networked intelligent devices, including the following steps:

filtering the application layer response data to obtain the application layer response data of the networked intelligent equipment;

extracting a feature keyword sequence for identifying the characteristics of the networked intelligent equipment from application layer response data of the networked intelligent equipment;

searching the characteristic keyword sequence in a search engine, and crawling the first n corresponding webpages in the search result;

filtering the webpage to obtain first text information, and extracting preliminary equipment description identification information of the networked intelligent equipment from the first text information based on a named entity identification algorithm of a hidden Markov model;

selecting the information with the highest frequency of occurrence in the same category of the preliminary equipment description identification information as the final equipment description identification information of the category of the networked intelligent equipment;

and searching whether the vulnerability identification information of the networked intelligent equipment exists in a CVE (virtual content environment) vulnerability database according to the final equipment description identification information, and if so, extracting the vulnerability identification information from a search result.

Optionally, the filtering the application layer response data of the networked smart device from the application layer response data includes:

filtering application layer response data and error response information of non-Internet-of-things equipment in the application layer response data;

the application layer response data of the non-Internet-of-things equipment comprises application layer response data of a heavyweight Web server;

the error response information is information of which the state codes of the HTTP responses are 4xx and 5 xx.

Optionally, the extracting a feature keyword sequence identifying characteristics of the networked smart device from the application layer response data of the networked smart device includes:

filtering application layer response data of the networked intelligent equipment to obtain second text information, and extracting a characteristic keyword sequence for identifying the characteristics of the networked intelligent equipment from the second text information by adopting a word frequency-inverse document frequency algorithm on the basis of a device response information corpus of the internet of things;

the Internet of things equipment response information corpus at least comprises equipment types, equipment manufacturers and equipment models of the Internet of things equipment.

Optionally, the filtering the second text information from the application layer response data of the networked smart device includes:

based on application layer response data of the networked intelligent equipment of the HTTP protocol, filtering out HTML labels, punctuation marks, non-numeric non-character characters and hyperlink contents by adopting a regular expression and a Beautiful Soup of a third-party library of Python, and reserving texts to obtain second text information;

or, the application layer response data of the networked intelligent device based on the FTP protocol adopts a regular expression to filter out punctuation marks, non-numeric non-literal characters and hyperlink contents, and retains texts, thereby obtaining the second text information.

Preferably, the filtering of the first text information from the web page includes:

and filtering out HTML (hypertext markup language) labels, punctuation marks, non-numeric non-character characters and hyperlink contents in the webpage by adopting a regular expression and a third-party library Beautiful Soup of Python, and reserving texts to obtain the first text information.

Optionally, the preliminary device description identification information and the final device description identification information each include a device type, a device model, and a device manufacturer.

The invention provides a second aspect of the identification information extraction framework system of the networked intelligent equipment, the extraction framework system includes: the system comprises a data processing module, a searching/processing module, a management module and a front-end display module;

the data processing module comprises a filtering module and a preprocessing module; the filtering module is used for filtering the application layer response data to obtain the application layer response data of the networked intelligent equipment; the preprocessing module is used for extracting a characteristic keyword sequence for identifying the characteristics of the networked intelligent equipment from application layer response data of the networked intelligent equipment;

the searching/processing module comprises a searching module and a processing module, and the searching module is used for searching the characteristic keyword sequence in a search engine and crawling the first n corresponding webpages in the search result; the processing module is used for filtering the webpage to obtain first text information;

the management module comprises an extraction module, a storage module and a query module;

the extraction module is used for extracting preliminary equipment description identification information of the networked intelligent equipment from the first text information by adopting a named entity identification algorithm based on a hidden Markov model, selecting information with the highest frequency of occurrence from the same category of the preliminary equipment description identification information as final equipment description identification information of the category of the networked intelligent equipment, searching whether vulnerability identification information of the networked intelligent equipment exists in a CVE (composite video environment) vulnerability database according to the final equipment description identification information, and if the vulnerability identification information exists, extracting the vulnerability identification information from a search result;

the storage module is used for determining whether the final equipment description identification information and the vulnerability identification information are stored, and if not, the final equipment description identification information and the vulnerability identification information are stored;

the query module is used for querying the identification information of the networked intelligent equipment in the storage module;

the front-end display module is used for interacting with the processing module, the searching/processing module and the management module.

Optionally, the querying, in the storage module, the identification information of the networked smart device includes:

and inquiring the identification information of the networked intelligent equipment in the storage module according to the equipment type, the equipment manufacturer, the equipment model or the equipment vulnerability number.

Optionally, the front-end display module comprises a keyword input module, an application layer response information input module and an application layer response information acquisition module;

the keyword input module is interacted with the query module and used for querying the identification information of the networked intelligent equipment in the storage module by inputting keywords;

the application layer response information input module is used for inputting response data of an application layer and processing the input application layer response data through the data processing module, the searching/processing module and the management module;

the application layer response information acquisition module is used for acquiring application layer response information according to an input IP (Internet protocol), a port number or a protocol and processing the acquired application layer response data through the data processing module, the search/processing module and the management module.

Optionally, the filtering the application layer response data of the networked smart device from the application layer response data includes:

filtering application layer response data and error response information of non-Internet-of-things equipment in the application layer response data;

the application layer response data of the non-Internet-of-things equipment comprises application layer response data of a heavyweight Web server;

the error response information is information with state codes of 4xx and 5xx of HTTP response;

preferably, said extracting a sequence of feature keywords identifying characteristics of networked smart devices from application layer response data of said networked smart devices comprises:

filtering application layer response data of the networked intelligent equipment to obtain second text information, and extracting a feature keyword sequence for identifying the characteristics of the networked intelligent equipment from the second text information by adopting a word frequency-inverse document frequency algorithm on the basis of an Internet of things equipment response information corpus, wherein the Internet of things equipment response information corpus at least comprises equipment types, equipment manufacturers and equipment models of the Internet of things equipment;

preferably, the filtering of the second text information from the application layer response data of the networked smart device includes:

based on application layer response data of the networked intelligent equipment of the HTTP protocol, filtering out HTML labels, punctuation marks, non-numeric non-character characters and hyperlink contents by adopting a regular expression and a Beautiful Soup of a third-party library of Python, and reserving texts to obtain second text information;

or, the application layer response data of the networked intelligent device based on the FTP protocol adopts a regular expression to filter out punctuation marks, non-numeric non-literal characters and hyperlink contents, and retains texts, thereby obtaining the second text information.

Preferably, the filtering of the first text information from the web page includes:

and filtering out HTML (hypertext markup language) labels, punctuation marks, non-numeric non-character characters and hyperlink contents in the webpage by adopting a regular expression and a third-party library Beautiful Soup of Python, and reserving texts to obtain the first text information.

Preferably, the preliminary device description identification information and the final device description identification information each include a device type, a device model, and a device manufacturer.

From the above, it can be seen that the method and system for extracting identification information of networked intelligent devices provided by the present invention at least have the following beneficial effects:

the extraction method adopts a hidden Markov model algorithm in machine learning and a named entity recognition algorithm in natural language processing to extract the equipment description recognition information in the search webpage, realizes heuristic extraction and can extract the unseen equipment description recognition information.

In the extraction method, the information with the highest frequency of occurrence is selected from the same category of the primary equipment description identification information as the final equipment description identification information of the category of the networked intelligent equipment; the method can achieve the same accuracy rate as the existing association rule mining algorithm, and is simple in algorithm and far smaller in calculation resource than other existing algorithms.

The extraction method can also extract the equipment vulnerability identification information according to the final equipment description identification information, can realize the extraction of the equipment information with finer granularity, is convenient for a manager to manage the networked intelligent equipment in the network, and reduces the occurrence of the safety problem of the equipment of the Internet of things.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a test result of the networked intelligent device identification information extraction framework system according to the embodiment of the present invention on application layer response information of 5000 pieces of networked intelligent devices.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

It is to be noted that technical terms or scientific terms used in the embodiments of the present invention should have the ordinary meanings as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

The identification information of the existing networked intelligent equipment is generally extracted based on a supervised machine learning technology or an extraction method based on natural language processing and data mining, however, only the type level of the equipment can be predicted based on the supervised machine learning technology, the information of the equipment with finer granularity cannot be predicted, a predictable equipment type set depends on the type of the equipment collected in advance, and a large amount of manual participation is needed for the collection and labeling of a data set; the extraction of information such as equipment type, equipment manufacturer, equipment model and the like in an equipment description webpage in the method for extracting the identification information of the networked intelligent equipment based on natural language processing and data mining completely depends on rule matching and a rule base, so that the extraction performance greatly depends on the quality of the rule and the perfection of the rule base, and the extraction performance is poor.

In view of the above problems, an embodiment of the present invention provides a method for extracting identification information of networked intelligent devices, including the following steps:

filtering the application layer response data to obtain the application layer response data of the networked intelligent equipment;

extracting a feature keyword sequence for identifying the characteristics of the networked intelligent equipment from application layer response data of the networked intelligent equipment;

searching a characteristic keyword sequence in a search engine, and crawling the first n corresponding webpages in a search result;

filtering the webpage to obtain first text information, and extracting preliminary equipment description identification information of the networked intelligent equipment from the first text information based on a named entity identification algorithm of a hidden Markov model;

selecting the information with the highest frequency of occurrence from the same category of the primary equipment description identification information as the final equipment description identification information of the category of the networked intelligent equipment;

and searching whether the vulnerability identification information of the networked intelligent equipment exists in the CVE vulnerability database according to the final equipment description identification information, and if so, extracting the vulnerability identification information from the search result.

The extraction method can extract the equipment description identification information and the copper leakage identification information of the equipment, can realize the extraction of the equipment information with finer granularity, is convenient for a manager to manage the networked intelligent equipment in the network, and reduces the safety problem of the equipment of the Internet of things; in addition, the extraction method adopts a hidden Markov model algorithm in machine learning and a named entity recognition algorithm in natural language processing to extract the equipment description recognition information in a search webpage, so that heuristic extraction is realized, and unseen equipment description recognition information can be extracted; and the method for selecting the information with the highest frequency of occurrence from the same category of the primary equipment description identification information as the final equipment description identification information of the category of the networked intelligent equipment can achieve the same accuracy rate as the existing association rule mining algorithm, and the algorithm is simple, and the computing resources are far smaller than other existing algorithms.

Further, the number of the crawled web pages in the first n corresponding web pages crawled in the search result may be not less than 30.

Further, the preliminary device description identification information and the final device description identification information each include a device type, a device model, and a device manufacturer.

The filtering method for filtering the application layer response data of the networked intelligent device from the application layer response data is not strictly limited, and for example, a conventional filtering method in the field can be adopted; in particular, in some embodiments, filtering the application layer response data of the networked smart device from the application layer response data comprises:

filtering application layer response data and error response information of non-Internet-of-things equipment in the application layer response data;

the application layer response data of the non-Internet-of-things equipment comprises application layer response data of a heavyweight Web server;

the error response information is information in which the status codes of HTTP responses are 4xx and 5 xx.

In some embodiments, extracting a sequence of feature keywords from application layer response data of the networked smart devices that identify characteristics of the networked smart devices includes:

filtering application layer response data of the networked intelligent equipment to obtain second text information, and extracting a characteristic keyword sequence for identifying the characteristics of the networked intelligent equipment from the second text information by adopting a word frequency-inverse document frequency algorithm on the basis of a device response information corpus of the Internet of things;

the Internet of things equipment response information corpus at least comprises equipment types, equipment manufacturers and equipment models of the Internet of things equipment.

Further, the filtering of the second text information from the application layer response data of the networked smart device includes:

based on application layer response data of the networked intelligent equipment of the HTTP protocol, filtering out HTML labels, punctuation marks, non-numeric non-character characters and hyperlink contents by adopting a regular expression and a Beautiful Soup of a third-party library of Python, and reserving texts to obtain second text information;

and filtering punctuation marks, non-numeric non-literal characters and hyperlink contents by adopting a regular expression and retaining texts to obtain the second text information based on the application layer response data of the FTP-protocol-based networked intelligent equipment.

The method for filtering the first text information from the web page is not strictly limited, and may be performed according to a conventional filtering method in the art, for example, in some embodiments, the filtering the first text information from the web page includes:

and filtering out HTML (hypertext markup language) labels, punctuation marks, non-numeric non-character characters and hyperlink contents in the webpage by adopting a regular expression and a third-party library Beautiful Soup of Python, and reserving texts to obtain the first text information.

The embodiment of the invention also provides a frame system for extracting the identification information of the networked intelligent equipment, which comprises the following steps: the system comprises a data processing module, a searching/processing module, a management module and a front-end display module;

the data processing module comprises a filtering module and a preprocessing module;

the filtering module is used for filtering application layer response data of the non-Internet-of-things equipment and response information with errors in the application layer response data to obtain application layer response data of the networked intelligent equipment, wherein the application layer response data of the non-Internet-of-things equipment comprises application layer response data of a heavyweight Web server, and the response information with errors is information with state codes of HTTP responses being 4xx and 5 xx;

the preprocessing module is used for filtering application layer response data of the networked intelligent device to obtain second text information, and then extracting a feature keyword sequence for identifying the characteristics of the networked intelligent device from the second text information by adopting a word frequency-inverse document frequency algorithm on the basis of a device response information corpus of the internet of things, wherein the filtering of the application layer response data of the networked intelligent device to obtain the second text information comprises the following steps:

based on application layer response data of the networked intelligent equipment of the HTTP protocol, filtering out HTML labels, punctuation marks, non-numeric non-character characters and hyperlink contents by adopting a regular expression and a Beautiful Soup of a third-party library of Python, and reserving texts to obtain second text information; or based on application layer response data of the FTP protocol networking intelligent equipment, filtering out punctuation marks, non-numeric non-literal characters and hyperlink contents by adopting a regular expression, and reserving texts to obtain second text information; the Internet of things equipment response information corpus at least comprises an equipment type, an equipment manufacturer and an equipment model of the Internet of things equipment;

the searching/processing module comprises a searching module and a processing module, wherein the searching module is used for searching the characteristic keyword sequence in a search engine and crawling the first n corresponding webpages in the search result, wherein n is not less than 30; the processing module is used for filtering out HTML labels, punctuation marks, non-numeric non-character characters and hyperlink contents in the webpage by adopting a regular expression and a Beautiful Soup of a third-party library of Python, and reserving texts to obtain the first text information;

the management module comprises an extraction module, a storage module and a query module;

the extraction module is used for extracting preliminary equipment description identification information of the networked intelligent equipment from the first text information by adopting a named entity identification algorithm based on a hidden Markov model, selecting information with the highest frequency of occurrence from the same category of the preliminary equipment description identification information as final equipment description identification information of the category of the networked intelligent equipment, searching whether vulnerability identification information of the networked intelligent equipment exists in a CVE (composite video environment) vulnerability library according to the final equipment description identification information, and extracting the vulnerability identification information from a search result if the vulnerability identification information exists, wherein the preliminary equipment description identification information and the final equipment description identification information comprise equipment types, equipment models and equipment manufacturers;

the storage module is used for storing identification information of the existing networked intelligent equipment, determining whether final equipment description identification information and vulnerability identification information are stored or not, and if not, storing the final equipment description identification information and the vulnerability identification information; by storing the final equipment description identification information and the vulnerability identification information, the identification information of the networked intelligent equipment in the database can be increased and stored, so that the named entity identification algorithm based on the hidden Markov model can be fed back conveniently, and the performance of the model can be improved;

the query module is used for querying the identification information of the networked intelligent equipment in the storage module according to the equipment type, the equipment manufacturer, the equipment model or the equipment vulnerability number;

the front-end display module comprises a keyword input module, an application layer response information input module and an application layer response information acquisition module, wherein the keyword input module is interacted with the query module and is used for querying identification information of the networked intelligent equipment in the storage module by inputting keywords;

the application layer response information input module is used for inputting response data of the application layer and processing the input application layer response data through the data processing module, the searching/processing module and the management module;

the application layer response information acquisition module is used for acquiring application layer response information according to the input IP, the port number or the protocol and processing the acquired application layer response data through the data processing module, the search/processing module and the management module.

5000 pieces of networked intelligent equipment are tested based on the networked intelligent equipment identification information extraction framework system provided by the embodiment so as to determine the accuracy of the networked intelligent equipment identification information extraction framework system;

the storage module in the networked intelligent device identification information extraction framework system stores identification information of existing networked intelligent devices, wherein the identification information comprises 23 device types, 118 device manufacturers and 23871 device models; testing the application layer response information of 5000 pieces of networked intelligent equipment by the networked intelligent equipment identification information extraction framework system, wherein the test result is shown in figure 1;

as can be seen from fig. 1, the extraction accuracy of the networked intelligent device identification information extraction framework system provided by the present invention to the networked intelligent device identification information reaches 97.26%.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.

The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

13页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种设备资产探测方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类