Hardware Trojan horse detection system based on unsupervised learning and information data processing method

文档序号：153430 发布日期：2021-10-26 浏览：34次中文

阅读说明：本技术 基于无监督学习的硬件木马检测系统和信息数据处理方法 (Hardware Trojan horse detection system based on unsupervised learning and information data processing method ) 是由史江义张焱李康潘伟涛董勐王杰温聪陈嘉伟于 2021-06-15 设计创作，主要内容包括：本发明属于硬件安全技术领域,公开了一种基于无监督学习的硬件木马检测系统和信息数据处理方法,通过分析电路结构和木马电路运行逻辑,提出木马检测需要的特征；结合随机森林、相关性矩阵和平行坐标图分析特征的重要程度,对特征进行筛选,得到最佳特征集；采用主成分分析PCA方法对高维数据特征进行降维；采用降维后的数据训练Isolation Forest无监督模型,得到最佳训练模型；采用测试数据进行测试,根据测试结果计算准确度等参数,评估模型。本发明在减少数据维度的同时保留了数据的绝大部分信息,有效提高准确度,减少训练时间,同时使用无监督学习的方法,解决硬件木马检测领域标签值不易获得甚至无法获得的难题。(The invention belongs to the technical field of hardware safety, and discloses a hardware Trojan horse detection system based on unsupervised learning and an information data processing method, wherein characteristics required by Trojan horse detection are provided by analyzing a circuit structure and a Trojan horse circuit operation logic; analyzing the importance degree of the features by combining a random forest, a correlation matrix and a parallel coordinate graph, and screening the features to obtain an optimal feature set; reducing the dimension of the high-dimensional data features by adopting a Principal Component Analysis (PCA) method; training an Isolation Forest unsupervised model by using the data after dimensionality reduction to obtain an optimal training model; and testing by adopting the test data, calculating parameters such as accuracy and the like according to the test result, and evaluating the model. The invention retains most information of data while reducing data dimension, effectively improves accuracy, reduces training time, and solves the problem that label value is difficult to obtain or even can not be obtained in the hardware Trojan horse detection field by using an unsupervised learning method.)

1. A hardware Trojan horse detection system and an information data processing method based on unsupervised learning are characterized in that the hardware Trojan horse detection system and the information data processing method based on unsupervised learning comprise:

firstly, providing characteristics required by Trojan horse detection by analyzing a circuit structure and Trojan horse circuit operation logic and combining circuit characteristics learned by a traditional machine;

then, analyzing the importance degree of the features by combining a random forest, a correlation matrix and a parallel coordinate graph, and screening the features to obtain an optimal feature set; then, reducing the dimension of the high-dimensional data characteristics by adopting a Principal Component Analysis (PCA) method;

finally, training an Isolation Forest unsupervised model by adopting the data subjected to dimension reduction to obtain an optimal training model; and testing by adopting the test data, calculating an accuracy parameter according to a test result, and evaluating the model.

2. The unsupervised learning-based hardware Trojan horse detection system and the information data processing method according to claim 1, wherein the unsupervised learning-based hardware Trojan horse detection system and the information data processing method comprise the following steps:

analyzing hardware Trojan characteristics from the aspects of circuit structures and Trojan circuit operation logics, and providing circuit characteristics required by Trojan detection by combining the circuit characteristics learned by a traditional machine;

preprocessing a gate-level netlist to be detected, and extracting features required by hardware Trojan horse detection;

analyzing the contribution degree of circuit characteristics to distinguishing the Trojan network from the normal network by combining a random forest, a correlation matrix and a parallel coordinate graph, screening the characteristics, and selecting an optimal characteristic set;

step four, carrying out normalization processing on the data set obtained in the step three;

step five, adopting a characteristic dimension reduction method to perform dimension reduction processing on the data set obtained in the step four;

step six, constructing a classifier based on unsupervised learning, training by using the data subjected to dimensionality reduction, and optimizing a model according to a training result to obtain an optimal training model;

step seven, dividing a plurality of data sets to be tested into a training set and a test set by adopting a cross validation method;

step eight, inputting the test data into the trained model for detection, calculating indexes of TPR, TNR, Precision, Recall, F1-score and Accuracy according to the detection result, and evaluating the detection capability of the model.

3. The unsupervised learning-based hardware Trojan horse detection system and the information data processing method according to claim 2, wherein in the first step, the selected Trojan horse characteristics comprise:

the number of basic logic gates x-level far away from the input end or the output end of a net;

the fan-in number of the logic gates which are x-level far away from the input end of the net of the wire network;

the number of triggers x-level far away from the input end or the output end of the net;

the number of the multi-path selectors which are x-level far away from the input end or the output end of the net;

the logic stage number of the multi-path selector nearest to the input end or the output end of the net of the line network;

sixthly, logic level of the trigger closest to the input end or the output end of the net of the line network;

the input end or the output end of net of wire mesh contains the number of x-level loops;

the logic stage number of the main input or main output nearest to the net of the net;

ninthly, the number of constant terms at x-level distance from the net input end or the output end of the wire network;

the logic level of the inverter with the distance (R & lt) to the net input end or the output end of the line network;

maximum number of logic gates of the same type at x levels far from the net input end of the net;

wherein x has a value of 1, 2, 3, 4, 5.

4. The unsupervised learning-based hardware Trojan horse detection system and the information data processing method as claimed in claim 2, wherein in the third step, a random forest, a correlation matrix and a parallel coordinate graph method are used, features are screened based on the importance degree of the features, a parallel coordinate graph is used for visualizing high-dimensional data, and the contribution degree of each feature to distinguishing a Trojan horse circuit from a normal circuit is visually seen; then, a random forest and correlation matrix method is adopted to obtain the determined value of the feature importance and the correlation degree between the features; and analyzing by combining the results of the three methods, and screening the characteristics to obtain the optimal characteristic set.

5. The unsupervised learning-based hardware Trojan horse detection system and the information data processing method according to claim 2, wherein in the third step, the selected optimal feature set comprises:

(1) the logic level of the main output nearest to the net of the net;

(2) the logic level of the main input nearest to the net of the net;

(3) the logic level of the trigger closest to the net output end of the wire network;

(4) the logic stage number of the multi-path selector closest to the net output end of the wire network;

(5) the number of basic logic gates 5 levels away from the net output end of the net;

(6) the number of triggers which are 5 levels far away from the net output end of the wire network;

(7) the number of the multi-path selectors 5 levels away from the net output end of the wire network;

(8) the number of triggers which are 5 levels away from the net input end of the wire network;

(9) the logic level of the trigger closest to the net input end of the wire network;

(10) the output end of the net of the wire network contains the number of 4-level loops;

(11) the input end of the net of the wire network contains the number of 5-level loops;

(12) the number of logic gates of the same type is far away from the 5-level of the net input end of the wire network;

(13) the number of multiplexers 5 levels away from the net input of the net;

(14) the logic stage number of the inverter closest to the net output end of the wire net;

(15) the number of basic logic gates that are 5 levels away from the net input of the net.

6. The unsupervised learning-based hardware Trojan horse detection system and the information data processing method as claimed in claim 2, wherein in the fifth step, the dimensionality reduction is performed by adopting a Principal Component Analysis (PCA) method, which comprises the following steps:

subtracting the respective average value from each bit feature to calculate the covariance matrixMatrix ofCalculating an eigenvalue and an eigenvector of the covariance matrix through SVD; sorting the eigenvalues from large to small, selecting the largest k eigenvectors, and taking the corresponding k eigenvectors as column vectors to form an eigenvector matrix;

wherein, the selection strategy of the k value is as follows: selecting different k values, then continuously calculating by using the following formula, and selecting the minimum k value which can meet the following formula conditions:

wherein t represents the information that the PCA algorithm retains the original data (1-t).

7. The hardware trojan detection system and the information data processing method based on unsupervised learning as claimed in claim 2, wherein in step six, the constructing of the classifier based on unsupervised learning, the training using the dimensionality reduced data, and the optimizing of the model according to the training result to obtain the optimal training model comprises:

(1) selecting a model according to the data distribution and the characteristics of each algorithm model;

(2) and setting model parameters, and continuously adjusting the optimization model according to the training result.

8. The hardware trojan horse detection system and the information data processing method based on unsupervised learning of claim 2, wherein in the sixth step, the selected unsupervised learning model is an Isolation Forest model, the data after dimensionality reduction obtained in the fifth step is used for training, and the model is continuously optimized according to the training result, comprising:

(1) setting the pollution rate to be 0.01, 0.02, 0.05, 0.08, 0.1;

(2) setting whether a put-back sampling bootstrap is False, and setting whether to inherit the last training classifier to carry out the next training of war _ start to True;

(3) setting the number of classifiers in the integrated model, namely the number n _ estimators of trees in the solitary forest as [120, 130, 140, 150, 160, 170, 180], setting the selected characteristic proportion max _ features during training each tree as [0.01, 0.02, 0.05, 0.08, 0.1], and setting the selected sample proportion max _ samples during training each tree as [0.01, 0.02, 0.05, 0.08, 0.1 ];

(4) setting the number n _ jobs of processes running simultaneously to be 4;

(5) and selecting default values for other parameters.

And (3) selecting the parameters in the steps (1) and (3) by using grid search, and selecting the parameter with the optimal result as the parameter of the optimal model.

9. The unsupervised learning-based hardware Trojan horse detection system and the information data processing method according to claim 2, wherein in step seven, the cross validation method comprises the following steps: if N circuits to be tested are provided, one circuit to be tested is taken as a test set each time, the remaining N-1 circuits to be tested are taken as training sets, the process is repeated for N times, all the circuits to be tested are ensured to be trained and tested, and the data set is utilized to the maximum extent, so that the model can learn all information of the data.

10. The unsupervised learning-based hardware Trojan horse detection system and the information data processing method according to claim 2, wherein in step eight, the TPR, TNR, Precision, Recall, F1-score and Accuracy indexes are calculated as follows:

TNR＝TN/(TN+FP)；

TPR＝TP/(TP+FN)；

Precision＝TP/(TP+FP)；

Recall＝TN/(TN+FP)；

F1-score＝2*Precision*Recall/(Precision+Recall)；

Accuracy＝(TP+TN)/total。

11. a hardware trojan detection system for implementing the unsupervised learning-based hardware trojan detection system and the information data processing method according to any one of claims 1 to 9, wherein the hardware trojan detection system comprises:

the circuit characteristic acquisition module is used for analyzing hardware Trojan characteristics from the angle formed by the circuit structure and the Trojan, and providing the circuit characteristics required by Trojan detection by combining the circuit characteristics learned by the traditional machine and the Trojan structure;

the Trojan horse feature extraction module is used for preprocessing a gate-level netlist to be detected and extracting features required by hardware Trojan horse detection;

the optimal feature set selection module is used for analyzing the contribution degree of the circuit features to the discrimination of the Trojan horse network and the normal network by combining the parallel coordinate graph and selecting an optimal feature set;

the characteristic set processing module is used for carrying out normalization processing on the obtained characteristic set; meanwhile, a feature dimension reduction method is adopted to perform dimension reduction processing on the obtained feature set;

the training model acquisition module is used for constructing a classifier based on unsupervised learning, training by using the data subjected to dimensionality reduction, and optimizing a model according to a training result to obtain an optimal training model;

the data set classification module is used for dividing a plurality of data sets to be tested into a training set and a test set by adopting a cross validation method;

and the hardware Trojan horse detection module is used for inputting the test data into the trained model for detection, calculating TPR, TNR, Precision, Recall, F1-score and Accuracy indexes according to the detection result, and evaluating the detection capability of the model.

12. An information data processing terminal characterized by being configured to implement the hardware trojan detection system of claim 9.

Technical Field

The invention belongs to the technical field of hardware safety, and particularly relates to a hardware Trojan horse detection system based on unsupervised learning and an information data processing method.

Background

At present, with the rapid development of the information-oriented society and the accelerated application of artificial intelligence technology, the demand of people for integrated circuit chips is increasing. However, as the chip design and manufacturing links are too complex, chip manufacturers cannot realize complete autonomous control of each link, which provides possibility for some attackers to implement malicious modification and damage on the integrated circuit. Such defective modules, which are deliberately manufactured by an attacker and inserted into the chip, cause a change in the chip function or performance and trigger under certain circumstances a special condition, are called hardware trojans. Hardware trojans pose a great potential threat to chip security, raising serious concerns about the integrity and security of integrated circuits.

Typically, a hardware trojan horse does not contain any state information. Malicious attackers have complete control of their hardware trojan triggers and have implanted various types of hardware trojans that are difficult to detect with conventional authentication techniques. In addition, the circulating SoC is a complex heterogeneous system composed of a plurality of third-party IP cores, and due to the small size and the concealment of the hardware trojans, the trojan detection technology in the third-party IP cores is difficult to completely distinguish trojan networks, and some trojans even need manual analysis. Some malicious third party vendors even collude to jointly manufacture hardware trojans to evade detection. Therefore, how to design a safe and reliable SoC security policy and trojan detection technology is an important issue for researchers.

Although the existing methods based on machine learning theory have better performance, the existing methods basically belong to supervised learning methods, and have a key premise that a large amount of known information exists. Furthermore, the training process of supervised learning approaches tends to be time consuming, often requiring a large amount of balanced training data. The purpose of unsupervised learning, namely an abnormal detection model, is to detect abnormal samples with behaviors greatly different from other samples, and the method is very suitable for hardware Trojan horse detection. In addition, as the circuit features used for Trojan horse detection are mostly high-dimensional data, the method has great influence on algorithm complexity, model training time, detection precision and the like. Therefore, a new hardware Trojan horse detection method is needed.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) the hardware trojan horse does not contain any state information, a malicious attacker completely controls the hardware trojan trigger and implants various types of hardware trojan horses, and the hardware trojan horse is difficult to detect by the traditional verification technology.

(2) The SoC in circulation is a complex heterogeneous system consisting of a plurality of third-party IP cores, due to the small size and the concealment of the hardware Trojan, the Trojan detection technology in the third-party IP cores is difficult to completely distinguish Trojan networks, and some malicious third-party suppliers even collude and jointly manufacture the hardware Trojan to avoid detection.

(3) The existing methods based on machine learning theory all belong to supervised learning methods and need a large amount of known information; the training process of supervised learning methods is time consuming and typically requires a large amount of balanced training data. And the circuit characteristics used for Trojan horse detection are mostly high-dimensional data, and have great influence on algorithm complexity, model training time, detection precision and the like.

The difficulty in solving the above problems and defects is:

1. the method is based on circuit characteristics, Trojan trigger logic and load circuit functions to analyze, and combines the circuit characteristics of traditional machine learning to provide the circuit characteristics capable of effectively detecting the Trojan circuit.

2. And (4) extracting an optimal feature set capable of efficiently detecting the Trojan horse by combining a feature analysis and screening method.

3. And the high-dimensional circuit features are processed, so that most data information is kept while the feature dimension is reduced.

4. An unsupervised learning model is constructed, the model is trained on the premise that label information and a large amount of balance data are not needed, and the trained model is used for detecting the Trojan horse circuit.

The significance of solving the problems and the defects is as follows:

(1) the method improves the selected characteristics when the hardware Trojan horse is detected based on static characteristics in the prior art, provides the circuit characteristics capable of efficiently detecting the hardware Trojan horse based on Trojan horse behaviors and circuit structures, and creates a new idea for subsequent related researches.

(2) The method for analyzing and extracting the features is provided, the features which are most effective for solving the problems are extracted from a large number of features, and reference is provided for relevant research for researching the correlation and similarity analysis of the circuit features.

(3) The feature dimension reduction method is provided, the feature dimension is effectively reduced, more than 99% of data information is reserved, and the problems of high algorithm complexity, long detection time, poor detection precision and the like caused by overhigh circuit feature dimension in the conventional machine learning method are solved.

(4) The unsupervised model is applied to Trojan horse detection, a large amount of label information and balanced data are not needed, the problems that the current Trojan horse information and normal circuit information are seriously unbalanced and the label value in the hardware Trojan horse detection field is difficult to obtain or even cannot be obtained are solved, and a new direction is provided for the subsequent related research in the hardware safety field.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a hardware Trojan horse detection system based on unsupervised learning and an information data processing method, and particularly relates to a hardware Trojan horse detection system based on unsupervised learning and an information data processing method.

The invention is realized in such a way, a hardware Trojan horse detection system and an information data processing method based on unsupervised learning, wherein the hardware Trojan horse detection system and the information data processing method based on unsupervised learning comprise:

firstly, providing characteristics required by Trojan horse detection by analyzing a circuit structure and Trojan horse operation logic and combining circuit characteristics of traditional machine learning; then, analyzing the importance degree of the features by combining a random forest and a parallel coordinate graph, and screening the features to obtain an optimal feature set; then, reducing the dimension of the high-dimensional data characteristics by adopting a Principal Component Analysis (PCA) method; finally, training an Isolation Forest unsupervised model by adopting the data subjected to dimension reduction to obtain an optimal training model; and testing by adopting the test data, calculating an accuracy parameter according to a test result, and evaluating the model.

Further, the hardware Trojan horse detection system based on unsupervised learning and the information data processing method comprise the following steps:

analyzing hardware Trojan characteristics from the aspects of circuit structures, Trojan trigger circuits and load circuit functions, associating the key characteristic of low Trojan circuit trigger probability with circuit static characteristics by combining the circuit characteristics and the Trojan structures learned by traditional machines, and providing circuit characteristics required by Trojan detection;

preprocessing a gate-level netlist to be detected, and extracting features required by hardware Trojan horse detection;

step four, carrying out normalization processing on the data set obtained in the step three;

step five, adopting a characteristic dimension reduction method to perform dimension reduction processing on the data set obtained in the step four;

step seven, dividing a plurality of data sets to be tested into a training set and a test set by adopting a cross validation method;

Further, in the step one, the selected trojan horse characteristics include:

the number of logic gates x-level far away from the input end or the output end of a net; the fan-in number of the logic gates which are x levels far away from the net x of the wire network; the number of triggers x-level far away from the input end or the output end of the net; the number of the multi-path selectors which are x-level far away from the input end or the output end of the net; the logic stage number of the multi-path selector nearest to the input end or the output end of the net of the line network; sixthly, logic level of the trigger closest to the input end or the output end of the net of the line network; the input end or the output end of net of wire mesh contains the number of x-level loops; the logic stage number of the main input or main output nearest to the net of the net; the number of constant terms x-level far away from the net input end or the output end of the wire network; the logic level of the inverter closest to the net input or output of the net;maximum number of logic gates of the same type at x levels far from the net input end of the net; wherein x has a value of 1, 2, 3, 4, 5.

Further, in the third step, the screening the features by combining the random forest, the correlation matrix and the parallel coordinate graph method includes:

and the parallel coordinate graph is used for visualizing the high-dimensional data, and the contribution degree of each characteristic to distinguishing the Trojan horse circuit from the normal circuit is visually seen.

Then, a random forest and correlation matrix method is adopted to obtain a definite value of the feature importance degree and the correlation degree between the features, the results of the three methods are compared and analyzed, and finally, an optimal feature set is selected, wherein the selected optimal feature set comprises: