Hardware Trojan horse detection system based on unsupervised learning and information data processing method

文档序号:153430 发布日期:2021-10-26 浏览:34次 中文

阅读说明:本技术 基于无监督学习的硬件木马检测系统和信息数据处理方法 (Hardware Trojan horse detection system based on unsupervised learning and information data processing method ) 是由 史江义 张焱 李康 潘伟涛 董勐 王杰 温聪 陈嘉伟 于 2021-06-15 设计创作,主要内容包括:本发明属于硬件安全技术领域,公开了一种基于无监督学习的硬件木马检测系统和信息数据处理方法,通过分析电路结构和木马电路运行逻辑,提出木马检测需要的特征;结合随机森林、相关性矩阵和平行坐标图分析特征的重要程度,对特征进行筛选,得到最佳特征集;采用主成分分析PCA方法对高维数据特征进行降维;采用降维后的数据训练Isolation Forest无监督模型,得到最佳训练模型;采用测试数据进行测试,根据测试结果计算准确度等参数,评估模型。本发明在减少数据维度的同时保留了数据的绝大部分信息,有效提高准确度,减少训练时间,同时使用无监督学习的方法,解决硬件木马检测领域标签值不易获得甚至无法获得的难题。(The invention belongs to the technical field of hardware safety, and discloses a hardware Trojan horse detection system based on unsupervised learning and an information data processing method, wherein characteristics required by Trojan horse detection are provided by analyzing a circuit structure and a Trojan horse circuit operation logic; analyzing the importance degree of the features by combining a random forest, a correlation matrix and a parallel coordinate graph, and screening the features to obtain an optimal feature set; reducing the dimension of the high-dimensional data features by adopting a Principal Component Analysis (PCA) method; training an Isolation Forest unsupervised model by using the data after dimensionality reduction to obtain an optimal training model; and testing by adopting the test data, calculating parameters such as accuracy and the like according to the test result, and evaluating the model. The invention retains most information of data while reducing data dimension, effectively improves accuracy, reduces training time, and solves the problem that label value is difficult to obtain or even can not be obtained in the hardware Trojan horse detection field by using an unsupervised learning method.)

1. A hardware Trojan horse detection system and an information data processing method based on unsupervised learning are characterized in that the hardware Trojan horse detection system and the information data processing method based on unsupervised learning comprise:

firstly, providing characteristics required by Trojan horse detection by analyzing a circuit structure and Trojan horse circuit operation logic and combining circuit characteristics learned by a traditional machine;

then, analyzing the importance degree of the features by combining a random forest, a correlation matrix and a parallel coordinate graph, and screening the features to obtain an optimal feature set; then, reducing the dimension of the high-dimensional data characteristics by adopting a Principal Component Analysis (PCA) method;

finally, training an Isolation Forest unsupervised model by adopting the data subjected to dimension reduction to obtain an optimal training model; and testing by adopting the test data, calculating an accuracy parameter according to a test result, and evaluating the model.

2. The unsupervised learning-based hardware Trojan horse detection system and the information data processing method according to claim 1, wherein the unsupervised learning-based hardware Trojan horse detection system and the information data processing method comprise the following steps:

analyzing hardware Trojan characteristics from the aspects of circuit structures and Trojan circuit operation logics, and providing circuit characteristics required by Trojan detection by combining the circuit characteristics learned by a traditional machine;

preprocessing a gate-level netlist to be detected, and extracting features required by hardware Trojan horse detection;

analyzing the contribution degree of circuit characteristics to distinguishing the Trojan network from the normal network by combining a random forest, a correlation matrix and a parallel coordinate graph, screening the characteristics, and selecting an optimal characteristic set;

step four, carrying out normalization processing on the data set obtained in the step three;

step five, adopting a characteristic dimension reduction method to perform dimension reduction processing on the data set obtained in the step four;

step six, constructing a classifier based on unsupervised learning, training by using the data subjected to dimensionality reduction, and optimizing a model according to a training result to obtain an optimal training model;

step seven, dividing a plurality of data sets to be tested into a training set and a test set by adopting a cross validation method;

step eight, inputting the test data into the trained model for detection, calculating indexes of TPR, TNR, Precision, Recall, F1-score and Accuracy according to the detection result, and evaluating the detection capability of the model.

3. The unsupervised learning-based hardware Trojan horse detection system and the information data processing method according to claim 2, wherein in the first step, the selected Trojan horse characteristics comprise:

the number of basic logic gates x-level far away from the input end or the output end of a net;

the fan-in number of the logic gates which are x-level far away from the input end of the net of the wire network;

the number of triggers x-level far away from the input end or the output end of the net;

the number of the multi-path selectors which are x-level far away from the input end or the output end of the net;

the logic stage number of the multi-path selector nearest to the input end or the output end of the net of the line network;

sixthly, logic level of the trigger closest to the input end or the output end of the net of the line network;

the input end or the output end of net of wire mesh contains the number of x-level loops;

the logic stage number of the main input or main output nearest to the net of the net;

ninthly, the number of constant terms at x-level distance from the net input end or the output end of the wire network;

the logic level of the inverter with the distance (R & lt) to the net input end or the output end of the line network;

maximum number of logic gates of the same type at x levels far from the net input end of the net;

wherein x has a value of 1, 2, 3, 4, 5.

4. The unsupervised learning-based hardware Trojan horse detection system and the information data processing method as claimed in claim 2, wherein in the third step, a random forest, a correlation matrix and a parallel coordinate graph method are used, features are screened based on the importance degree of the features, a parallel coordinate graph is used for visualizing high-dimensional data, and the contribution degree of each feature to distinguishing a Trojan horse circuit from a normal circuit is visually seen; then, a random forest and correlation matrix method is adopted to obtain the determined value of the feature importance and the correlation degree between the features; and analyzing by combining the results of the three methods, and screening the characteristics to obtain the optimal characteristic set.

5. The unsupervised learning-based hardware Trojan horse detection system and the information data processing method according to claim 2, wherein in the third step, the selected optimal feature set comprises:

(1) the logic level of the main output nearest to the net of the net;

(2) the logic level of the main input nearest to the net of the net;

(3) the logic level of the trigger closest to the net output end of the wire network;

(4) the logic stage number of the multi-path selector closest to the net output end of the wire network;

(5) the number of basic logic gates 5 levels away from the net output end of the net;

(6) the number of triggers which are 5 levels far away from the net output end of the wire network;

(7) the number of the multi-path selectors 5 levels away from the net output end of the wire network;

(8) the number of triggers which are 5 levels away from the net input end of the wire network;

(9) the logic level of the trigger closest to the net input end of the wire network;

(10) the output end of the net of the wire network contains the number of 4-level loops;

(11) the input end of the net of the wire network contains the number of 5-level loops;

(12) the number of logic gates of the same type is far away from the 5-level of the net input end of the wire network;

(13) the number of multiplexers 5 levels away from the net input of the net;

(14) the logic stage number of the inverter closest to the net output end of the wire net;

(15) the number of basic logic gates that are 5 levels away from the net input of the net.

6. The unsupervised learning-based hardware Trojan horse detection system and the information data processing method as claimed in claim 2, wherein in the fifth step, the dimensionality reduction is performed by adopting a Principal Component Analysis (PCA) method, which comprises the following steps:

subtracting the respective average value from each bit feature to calculate the covariance matrixMatrix ofCalculating an eigenvalue and an eigenvector of the covariance matrix through SVD; sorting the eigenvalues from large to small, selecting the largest k eigenvectors, and taking the corresponding k eigenvectors as column vectors to form an eigenvector matrix;

wherein, the selection strategy of the k value is as follows: selecting different k values, then continuously calculating by using the following formula, and selecting the minimum k value which can meet the following formula conditions:

wherein t represents the information that the PCA algorithm retains the original data (1-t).

7. The hardware trojan detection system and the information data processing method based on unsupervised learning as claimed in claim 2, wherein in step six, the constructing of the classifier based on unsupervised learning, the training using the dimensionality reduced data, and the optimizing of the model according to the training result to obtain the optimal training model comprises:

(1) selecting a model according to the data distribution and the characteristics of each algorithm model;

(2) and setting model parameters, and continuously adjusting the optimization model according to the training result.

8. The hardware trojan horse detection system and the information data processing method based on unsupervised learning of claim 2, wherein in the sixth step, the selected unsupervised learning model is an Isolation Forest model, the data after dimensionality reduction obtained in the fifth step is used for training, and the model is continuously optimized according to the training result, comprising:

(1) setting the pollution rate to be 0.01, 0.02, 0.05, 0.08, 0.1;

(2) setting whether a put-back sampling bootstrap is False, and setting whether to inherit the last training classifier to carry out the next training of war _ start to True;

(3) setting the number of classifiers in the integrated model, namely the number n _ estimators of trees in the solitary forest as [120, 130, 140, 150, 160, 170, 180], setting the selected characteristic proportion max _ features during training each tree as [0.01, 0.02, 0.05, 0.08, 0.1], and setting the selected sample proportion max _ samples during training each tree as [0.01, 0.02, 0.05, 0.08, 0.1 ];

(4) setting the number n _ jobs of processes running simultaneously to be 4;

(5) and selecting default values for other parameters.

And (3) selecting the parameters in the steps (1) and (3) by using grid search, and selecting the parameter with the optimal result as the parameter of the optimal model.

9. The unsupervised learning-based hardware Trojan horse detection system and the information data processing method according to claim 2, wherein in step seven, the cross validation method comprises the following steps: if N circuits to be tested are provided, one circuit to be tested is taken as a test set each time, the remaining N-1 circuits to be tested are taken as training sets, the process is repeated for N times, all the circuits to be tested are ensured to be trained and tested, and the data set is utilized to the maximum extent, so that the model can learn all information of the data.

10. The unsupervised learning-based hardware Trojan horse detection system and the information data processing method according to claim 2, wherein in step eight, the TPR, TNR, Precision, Recall, F1-score and Accuracy indexes are calculated as follows:

TNR=TN/(TN+FP);

TPR=TP/(TP+FN);

Precision=TP/(TP+FP);

Recall=TN/(TN+FP);

F1-score=2*Precision*Recall/(Precision+Recall);

Accuracy=(TP+TN)/total。

11. a hardware trojan detection system for implementing the unsupervised learning-based hardware trojan detection system and the information data processing method according to any one of claims 1 to 9, wherein the hardware trojan detection system comprises:

the circuit characteristic acquisition module is used for analyzing hardware Trojan characteristics from the angle formed by the circuit structure and the Trojan, and providing the circuit characteristics required by Trojan detection by combining the circuit characteristics learned by the traditional machine and the Trojan structure;

the Trojan horse feature extraction module is used for preprocessing a gate-level netlist to be detected and extracting features required by hardware Trojan horse detection;

the optimal feature set selection module is used for analyzing the contribution degree of the circuit features to the discrimination of the Trojan horse network and the normal network by combining the parallel coordinate graph and selecting an optimal feature set;

the characteristic set processing module is used for carrying out normalization processing on the obtained characteristic set; meanwhile, a feature dimension reduction method is adopted to perform dimension reduction processing on the obtained feature set;

the training model acquisition module is used for constructing a classifier based on unsupervised learning, training by using the data subjected to dimensionality reduction, and optimizing a model according to a training result to obtain an optimal training model;

the data set classification module is used for dividing a plurality of data sets to be tested into a training set and a test set by adopting a cross validation method;

and the hardware Trojan horse detection module is used for inputting the test data into the trained model for detection, calculating TPR, TNR, Precision, Recall, F1-score and Accuracy indexes according to the detection result, and evaluating the detection capability of the model.

12. An information data processing terminal characterized by being configured to implement the hardware trojan detection system of claim 9.

Technical Field

The invention belongs to the technical field of hardware safety, and particularly relates to a hardware Trojan horse detection system based on unsupervised learning and an information data processing method.

Background

At present, with the rapid development of the information-oriented society and the accelerated application of artificial intelligence technology, the demand of people for integrated circuit chips is increasing. However, as the chip design and manufacturing links are too complex, chip manufacturers cannot realize complete autonomous control of each link, which provides possibility for some attackers to implement malicious modification and damage on the integrated circuit. Such defective modules, which are deliberately manufactured by an attacker and inserted into the chip, cause a change in the chip function or performance and trigger under certain circumstances a special condition, are called hardware trojans. Hardware trojans pose a great potential threat to chip security, raising serious concerns about the integrity and security of integrated circuits.

Typically, a hardware trojan horse does not contain any state information. Malicious attackers have complete control of their hardware trojan triggers and have implanted various types of hardware trojans that are difficult to detect with conventional authentication techniques. In addition, the circulating SoC is a complex heterogeneous system composed of a plurality of third-party IP cores, and due to the small size and the concealment of the hardware trojans, the trojan detection technology in the third-party IP cores is difficult to completely distinguish trojan networks, and some trojans even need manual analysis. Some malicious third party vendors even collude to jointly manufacture hardware trojans to evade detection. Therefore, how to design a safe and reliable SoC security policy and trojan detection technology is an important issue for researchers.

Although the existing methods based on machine learning theory have better performance, the existing methods basically belong to supervised learning methods, and have a key premise that a large amount of known information exists. Furthermore, the training process of supervised learning approaches tends to be time consuming, often requiring a large amount of balanced training data. The purpose of unsupervised learning, namely an abnormal detection model, is to detect abnormal samples with behaviors greatly different from other samples, and the method is very suitable for hardware Trojan horse detection. In addition, as the circuit features used for Trojan horse detection are mostly high-dimensional data, the method has great influence on algorithm complexity, model training time, detection precision and the like. Therefore, a new hardware Trojan horse detection method is needed.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) the hardware trojan horse does not contain any state information, a malicious attacker completely controls the hardware trojan trigger and implants various types of hardware trojan horses, and the hardware trojan horse is difficult to detect by the traditional verification technology.

(2) The SoC in circulation is a complex heterogeneous system consisting of a plurality of third-party IP cores, due to the small size and the concealment of the hardware Trojan, the Trojan detection technology in the third-party IP cores is difficult to completely distinguish Trojan networks, and some malicious third-party suppliers even collude and jointly manufacture the hardware Trojan to avoid detection.

(3) The existing methods based on machine learning theory all belong to supervised learning methods and need a large amount of known information; the training process of supervised learning methods is time consuming and typically requires a large amount of balanced training data. And the circuit characteristics used for Trojan horse detection are mostly high-dimensional data, and have great influence on algorithm complexity, model training time, detection precision and the like.

The difficulty in solving the above problems and defects is:

1. the method is based on circuit characteristics, Trojan trigger logic and load circuit functions to analyze, and combines the circuit characteristics of traditional machine learning to provide the circuit characteristics capable of effectively detecting the Trojan circuit.

2. And (4) extracting an optimal feature set capable of efficiently detecting the Trojan horse by combining a feature analysis and screening method.

3. And the high-dimensional circuit features are processed, so that most data information is kept while the feature dimension is reduced.

4. An unsupervised learning model is constructed, the model is trained on the premise that label information and a large amount of balance data are not needed, and the trained model is used for detecting the Trojan horse circuit.

The significance of solving the problems and the defects is as follows:

(1) the method improves the selected characteristics when the hardware Trojan horse is detected based on static characteristics in the prior art, provides the circuit characteristics capable of efficiently detecting the hardware Trojan horse based on Trojan horse behaviors and circuit structures, and creates a new idea for subsequent related researches.

(2) The method for analyzing and extracting the features is provided, the features which are most effective for solving the problems are extracted from a large number of features, and reference is provided for relevant research for researching the correlation and similarity analysis of the circuit features.

(3) The feature dimension reduction method is provided, the feature dimension is effectively reduced, more than 99% of data information is reserved, and the problems of high algorithm complexity, long detection time, poor detection precision and the like caused by overhigh circuit feature dimension in the conventional machine learning method are solved.

(4) The unsupervised model is applied to Trojan horse detection, a large amount of label information and balanced data are not needed, the problems that the current Trojan horse information and normal circuit information are seriously unbalanced and the label value in the hardware Trojan horse detection field is difficult to obtain or even cannot be obtained are solved, and a new direction is provided for the subsequent related research in the hardware safety field.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a hardware Trojan horse detection system based on unsupervised learning and an information data processing method, and particularly relates to a hardware Trojan horse detection system based on unsupervised learning and an information data processing method.

The invention is realized in such a way, a hardware Trojan horse detection system and an information data processing method based on unsupervised learning, wherein the hardware Trojan horse detection system and the information data processing method based on unsupervised learning comprise:

firstly, providing characteristics required by Trojan horse detection by analyzing a circuit structure and Trojan horse operation logic and combining circuit characteristics of traditional machine learning; then, analyzing the importance degree of the features by combining a random forest and a parallel coordinate graph, and screening the features to obtain an optimal feature set; then, reducing the dimension of the high-dimensional data characteristics by adopting a Principal Component Analysis (PCA) method; finally, training an Isolation Forest unsupervised model by adopting the data subjected to dimension reduction to obtain an optimal training model; and testing by adopting the test data, calculating an accuracy parameter according to a test result, and evaluating the model.

Further, the hardware Trojan horse detection system based on unsupervised learning and the information data processing method comprise the following steps:

analyzing hardware Trojan characteristics from the aspects of circuit structures, Trojan trigger circuits and load circuit functions, associating the key characteristic of low Trojan circuit trigger probability with circuit static characteristics by combining the circuit characteristics and the Trojan structures learned by traditional machines, and providing circuit characteristics required by Trojan detection;

preprocessing a gate-level netlist to be detected, and extracting features required by hardware Trojan horse detection;

analyzing the contribution degree of circuit characteristics to distinguishing the Trojan network from the normal network by combining a random forest, a correlation matrix and a parallel coordinate graph, screening the characteristics, and selecting an optimal characteristic set;

step four, carrying out normalization processing on the data set obtained in the step three;

step five, adopting a characteristic dimension reduction method to perform dimension reduction processing on the data set obtained in the step four;

step six, constructing a classifier based on unsupervised learning, training by using the data subjected to dimensionality reduction, and optimizing a model according to a training result to obtain an optimal training model;

step seven, dividing a plurality of data sets to be tested into a training set and a test set by adopting a cross validation method;

step eight, inputting the test data into the trained model for detection, calculating indexes of TPR, TNR, Precision, Recall, F1-score and Accuracy according to the detection result, and evaluating the detection capability of the model.

Further, in the step one, the selected trojan horse characteristics include:

the number of logic gates x-level far away from the input end or the output end of a net; the fan-in number of the logic gates which are x levels far away from the net x of the wire network; the number of triggers x-level far away from the input end or the output end of the net; the number of the multi-path selectors which are x-level far away from the input end or the output end of the net; the logic stage number of the multi-path selector nearest to the input end or the output end of the net of the line network; sixthly, logic level of the trigger closest to the input end or the output end of the net of the line network; the input end or the output end of net of wire mesh contains the number of x-level loops; the logic stage number of the main input or main output nearest to the net of the net; the number of constant terms x-level far away from the net input end or the output end of the wire network; the logic level of the inverter closest to the net input or output of the net;maximum number of logic gates of the same type at x levels far from the net input end of the net; wherein x has a value of 1, 2, 3, 4, 5.

Further, in the third step, the screening the features by combining the random forest, the correlation matrix and the parallel coordinate graph method includes:

and the parallel coordinate graph is used for visualizing the high-dimensional data, and the contribution degree of each characteristic to distinguishing the Trojan horse circuit from the normal circuit is visually seen.

Then, a random forest and correlation matrix method is adopted to obtain a definite value of the feature importance degree and the correlation degree between the features, the results of the three methods are compared and analyzed, and finally, an optimal feature set is selected, wherein the selected optimal feature set comprises:

(1) the logic level of the main output nearest to the net of the net;

(2) the logic level of the main input nearest to the net of the net;

(3) the logic level of the trigger closest to the net output end of the wire network;

(4) the logic stage number of the multi-path selector closest to the net output end of the wire network;

(5) the number of basic logic gates 5 levels away from the net output end of the net;

(6) the number of triggers which are 5 levels far away from the net output end of the wire network;

(7) the number of the multi-path selectors 5 levels away from the net output end of the wire network;

(8) the number of triggers which are 5 levels away from the net input end of the wire network;

(9) the logic level of the trigger closest to the net input end of the wire network;

(10) the output end of the net of the wire network contains the number of 4-level loops;

(11) the input end of the net of the wire network contains the number of 5-level loops;

(12) the number of logic gates of the same type is far away from the 5-level of the net input end of the wire network;

(13) the number of multiplexers 5 levels away from the net input of the net;

(14) the logic stage number of the inverter closest to the net output end of the wire net;

(15) the number of basic logic gates that are 5 levels away from the net input of the net.

Further, in the fifth step, the dimensionality reduction is performed by using a Principal Component Analysis (PCA) method, which includes:

subtracting the respective average value from each bit feature to calculate a covariance matrixCalculating an eigenvalue and an eigenvector of the covariance matrix through SVD; sorting the eigenvalues from large to small, selecting the largest k eigenvectors, and then respectively taking the corresponding k eigenvectors as column vectors to form an eigenvector matrix.

Wherein, the selection strategy of the k value is as follows: selecting different k values, then continuously calculating by using the following formula, and selecting the minimum k value which can meet the following formula conditions:

wherein t represents the information that the PCA algorithm retains the original data (1-t).

Further, in the sixth step, the constructing a classifier based on unsupervised learning, training by using the data after dimensionality reduction, and optimizing the model according to the training result to obtain an optimal training model includes:

(1) selecting a model according to the data distribution and the characteristics of each algorithm model;

(2) and setting model parameters, and continuously adjusting the optimization model according to the training result.

Further, in the sixth step, the selected unsupervised learning model is an Isolation Forest model, the dimensionality-reduced data obtained in the fifth step is used for training, and the model is continuously optimized according to a training result, and the method comprises the following steps:

(1) setting the pollution rate to be 0.01, 0.02, 0.05, 0.08, 0.1;

(2) setting whether a put-back sampling bootstrap is False, and setting whether to inherit the last training classifier to carry out the next training of war _ start to True;

(3) setting the number of classifiers in the integrated model, namely the number n _ estimators of trees in the solitary forest as [120, 130, 140, 150, 160, 170, 180], setting the selected characteristic proportion max _ features during training each tree as [0.01, 0.02, 0.05, 0.08, 0.1], and setting the selected sample proportion max _ samples during training each tree as [0.01, 0.02, 0.05, 0.08, 0.1 ];

(4) setting the number n _ jobs of processes running simultaneously to be 4;

(5) and selecting default values for other parameters.

And (3) selecting the parameters in the steps (1) and (3) by using grid search, and selecting the parameter with the optimal result as the parameter of the optimal model.

Further, in step seven, the cross validation method includes:

if N circuits to be tested are provided, one circuit to be tested is taken as a test set each time, the remaining N-1 circuits to be tested are taken as training sets, the process is repeated for N times, all the circuits to be tested are ensured to be trained and tested, and the data set is utilized to the maximum extent, so that the model can learn all information of the data.

Further, in step eight, the TPR, TNR, Precision, Recall, F1-score and Accuracy indices are calculated as follows:

TNR=TN/(TN+FP);

TPR=TP/(TP+FN);

Precision=TP/(TP+FP);

Recall=TN/(TN+FP);

F1-score=2*Precision*Recall/(Precision+Recall);

Accuracy=(TP+TN)/total。

another object of the present invention is to provide a hardware Trojan detection system using the unsupervised learning-based hardware Trojan detection system and the information data processing method, wherein the hardware Trojan detection system comprises:

the circuit characteristic acquisition module is used for analyzing hardware Trojan characteristics from the aspects of circuit structures, Trojan trigger circuits and load circuit functions, associating the key characteristic of low Trojan circuit trigger probability with circuit static characteristics by combining the circuit characteristics learned by a traditional machine and the Trojan structure, and providing circuit characteristics required by Trojan detection;

the Trojan horse feature extraction module is used for preprocessing a gate-level netlist to be detected and extracting features required by hardware Trojan horse detection;

the optimal feature set selection module is used for analyzing the contribution degree of circuit features to the discrimination of the Trojan horse network and the normal network by combining the random forest and the parallel coordinate graph, screening the features and selecting the optimal feature set;

the characteristic set processing module is used for carrying out normalization processing on the obtained characteristic set; meanwhile, a feature dimension reduction method is adopted to perform dimension reduction processing on the obtained feature set;

the training model acquisition module is used for constructing a classifier based on unsupervised learning, training by using the data subjected to dimensionality reduction, and optimizing a model according to a training result to obtain an optimal training model;

the data set classification module is used for dividing a plurality of data sets to be tested into a training set and a test set by adopting a cross validation method;

and the hardware Trojan horse detection module is used for inputting the test data into the trained model for detection, calculating TPR, TNR, Precision, Recall, F1-score and Accuracy indexes according to the detection result, and evaluating the detection capability of the model.

Another object of the present invention is to provide an information data processing terminal, which is used for implementing the hardware Trojan horse detection system.

By combining all the technical schemes, the invention has the advantages and positive effects that:

the hardware Trojan horse detection system based on unsupervised learning and the information data processing method provided by the invention firstly provide the characteristics required by Trojan horse detection by analyzing the circuit structure and the Trojan horse operation logic and combining the circuit characteristics of the traditional machine learning. And then, analyzing the importance degree of the features by combining the random forest and the parallel coordinate graph, and screening the features to obtain an optimal feature set. Then, the dimensionality reduction is carried out on the high-dimensional data features by adopting a PCA (principal component analysis) method. And finally, training the Isolation Forest unsupervised model by adopting the data subjected to the dimensionality reduction to obtain an optimal training model. And testing by adopting the test data, calculating parameters such as accuracy and the like according to the test result, and evaluating the model.

The advantages and positive effects achieved are as follows:

1. the method improves the characteristics selected when the hardware Trojan horse is detected based on the static characteristics in the prior art, creatively associates the key attribute of low Trojan horse triggering probability with the static characteristics of the circuit, and creates a new idea for subsequent related research.

2. The method for analyzing and extracting the features is provided, the features which are most effective for solving the problems are extracted from a large number of features, and reference is provided for relevant research for researching the correlation and similarity analysis of the circuit features.

3. The feature dimension reduction method is provided, the feature dimension is effectively reduced, more than 99% of data information is reserved, and the problems of high algorithm complexity, long detection time, poor detection precision and the like caused by overhigh circuit feature dimension in the conventional machine learning method are solved.

4. The unsupervised model is applied to Trojan horse detection, a large amount of label information and balanced data are not needed, the problems that the current Trojan horse information and normal circuit information are seriously unbalanced and the label value in the hardware Trojan horse detection field is difficult to obtain or even cannot be obtained are solved, and a new direction is provided for the subsequent related research in the hardware safety field.

The unsupervised hardware Trojan detection method combining Principal Component Analysis (PCA) and Isolation Forest (Isolation Forest) algorithm provided by the invention effectively solves the problems encountered by hardware Trojan detection and also provides a new direction for related research in the field of hardware safety.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a hardware trojan detection system and an information data processing method based on unsupervised learning according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a hardware Trojan horse detection system and an information data processing method based on unsupervised learning according to an embodiment of the present invention.

Fig. 3 is a block diagram of a hardware Trojan horse detection system according to an embodiment of the present invention;

in the figure: 1. a circuit characteristic acquisition module; 2. a Trojan horse feature extraction module; 3. an optimal feature set selection module; 4. a feature set processing module; 5. a training model acquisition module; 6. a data set classification module; 7. hardware Trojan horse detection module.

Fig. 4 is a schematic diagram of random forest classification according to an embodiment of the present invention.

Fig. 5 is a correlation matrix diagram provided by an embodiment of the invention.

Fig. 6 is a schematic diagram of an isolated forest algorithm provided by the embodiment of the present invention.

FIG. 7 is a schematic diagram of a Trojan horse circuit of a test circuit RS232-T1200 according to an embodiment of the present invention.

FIG. 8 is a schematic diagram of a Trojan horse circuit of the test circuit s15850-T100 according to an embodiment of the present invention.

FIG. 9 is a schematic diagram of a Trojan horse circuit of a test circuit s38417-T300 according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Aiming at the problems in the prior art, the invention provides a hardware Trojan horse detection system based on unsupervised learning and an information data processing method, and the invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the hardware trojan detection system and the information data processing method based on unsupervised learning according to the embodiment of the present invention include the following steps:

s101, analyzing hardware Trojan characteristics from the aspects of circuit structures, Trojan trigger circuits and load circuit functions, associating the key attribute of low Trojan trigger probability with circuit static characteristics by combining the circuit characteristics and the Trojan structures learned by traditional machines, and providing circuit characteristics required by Trojan detection;

s102, preprocessing a gate-level netlist to be detected, and extracting features required by hardware Trojan horse detection;

s103, analyzing the contribution degree of the circuit characteristics to distinguishing the Trojan network and the normal network by combining the random forest, the correlation matrix and the parallel coordinate graph, screening the characteristics, and selecting an optimal characteristic set;

s104, carrying out normalization processing on the feature set obtained in the S103; simultaneously, performing dimension reduction processing on the normalized data set by adopting a characteristic dimension reduction method;

s105, constructing a classifier based on unsupervised learning, training by using the data subjected to dimensionality reduction, and optimizing a model according to a training result to obtain an optimal training model;

s106, dividing a plurality of data sets to be tested into a training set and a testing set by adopting a cross validation method;

s107, inputting the test data into the trained model for detection, calculating indexes of TPR, TNR, Precision, Recall, F1-score and Accuracy according to the detection result, and evaluating the detection capability of the model.

A schematic diagram of a hardware Trojan horse detection system and an information data processing method based on unsupervised learning according to an embodiment of the present invention is shown in fig. 2.

As shown in fig. 3, a hardware trojan detection system provided in an embodiment of the present invention includes:

the circuit characteristic acquisition module 1 is used for analyzing hardware Trojan characteristics from the aspects of circuit structures, Trojan trigger circuits and load circuit functions, associating the key attribute of low Trojan trigger probability with circuit static characteristics by combining the circuit characteristics learned by a traditional machine and the Trojan structures, and providing circuit characteristics required by Trojan detection;

the Trojan horse feature extraction module 2 is used for preprocessing a gate-level netlist to be detected and extracting features required by hardware Trojan horse detection;

the optimal feature set selection module 3 is used for analyzing the contribution degree of circuit features to distinguishing the Trojan horse network from the normal network by combining a random forest, a correlation matrix and a parallel coordinate graph, screening the features and selecting an optimal feature set;

the feature set processing module 4 is used for carrying out normalization processing on the obtained feature set; meanwhile, a feature dimension reduction method is adopted to perform dimension reduction processing on the obtained feature set;

the training model acquisition module 5 is used for constructing a classifier based on unsupervised learning, training by using the data after dimensionality reduction, and optimizing a model according to a training result to obtain an optimal training model;

the data set classification module 6 is used for classifying a plurality of data sets to be tested into a training set and a test set by adopting a cross validation method;

and the hardware Trojan horse detection module 7 is used for inputting the test data into the trained model for detection, calculating TPR, TNR, Precision, Recall, F1-score and Accuracy indexes according to the detection result, and evaluating the detection capability of the model.

The technical solution of the present invention will be further described with reference to the following examples.

Example 1

The hardware Trojan horse detection method based on machine learning provided by the embodiment of the invention comprises the following steps: analyzing a circuit structure and a Trojan running logic, combining circuit characteristics of traditional machine learning, providing required circuit characteristics, processing a gate-level netlist of a circuit to be detected, extracting characteristics required by hardware Trojan detection, analyzing contribution of the circuit characteristics to distinguishing a normal circuit from a Trojan circuit by combining a random Forest, a correlation matrix and a parallel coordinate graph, selecting an optimal characteristic set, performing normalization processing, performing dimensionality reduction on data by adopting a Principal Component Analysis (PCA) method, training an Isolation Forest classifier by using the dimensionality reduced data to obtain an optimal training model, inputting test data into the trained model for detection to obtain a test result, calculating relevant indexes such as TPR, TNR, Precision, Recall, F1-score and Accuracy, and drawing an ROC curve to evaluate the detection capability of the model.

The circuit features proposed by the present invention include those of conventional machine learning and those of the present invention newly proposed.

The parallel coordinate graph used by the invention is drawn by using Python language, the last coordinate axis is a label value, 1 represents a Trojan network, 0 represents a normal network, and the other coordinate axes are characteristic values and are used for analyzing the contribution degree of each coordinate to distinguishing the Trojan network from the normal network.

The random forest method and the correlation matrix method are realized by using Python language, the random forest can give the determined value of the importance of each feature, and the correlation matrix can obtain the correlation degree among the features. By combining the results of the three methods, the contribution degree of each feature to distinguishing the Trojan horse circuit from the normal circuit can be accurately obtained, so that the optimal feature set can be selected.

The invention adopts a characteristic dimension reduction method to reduce the dimension of the data characteristic, reduces the calculation complexity of the algorithm and simultaneously reserves most information of the data.

The selected characteristic dimension reduction method is PCA (principal component analysis), and the specific steps are as follows:

step S1: de-averaging (i.e., de-centering), i.e., subtracting the respective average value from each bit feature.

Step S2: computing a covariance matrix

Step S3: and calculating an eigenvalue and an eigenvector of the covariance matrix through SVD.

Step S4: sorting the eigenvalues from large to small, and selecting the largest k of the eigenvalues. Then, the corresponding k eigenvectors are respectively used as column vectors to form an eigenvector matrix.

The selection strategy of the k value is as follows: different k values are selected, then the following formula is used for continuous calculation, and the minimum k value which can meet the following formula conditions is selected.

Wherein t represents that the PCA algorithm retains the information of the original data (1-t). For example, when the value of t is 0.01, it represents that the PCA algorithm retains 99% of the main information of the original data.

The invention trains an unsupervised learning-based classifier using well-processed features. In contrast to supervised learning models, unsupervised learning models do not require a large amount of known label information. In addition, the unsupervised learning method is time-consuming in the training process and does not require a large amount of balanced training data.

The unsupervised learning model selected by the invention is Isolation Forest, the model selects Isolation Forest of sklern, and the specific setting is as follows: setting the pollution rate to be 0.01, 0.02, 0.05, 0.08, 0.1; setting whether a put-back sampling bootstrap is False, and setting whether to inherit the last training classifier to carry out the next training of war _ start to True; setting the number of classifiers in the integrated model, namely the number n _ estimators of trees in the solitary forest as [120, 130, 140, 150, 160, 170, 180], setting the selected characteristic proportion max _ features during training each tree as [0.01, 0.02, 0.05, 0.08, 0.1], and setting the selected sample proportion max _ samples during training each tree as [0.01, 0.02, 0.05, 0.08, 0.1 ]; setting the number n _ jobs of processes running simultaneously to be 4; selecting default values for other parameters; and (3) selecting the parameters in the steps (1) and (3) by using grid search, and selecting the parameter with the optimal result as the parameter of the optimal model.

Example 2

The invention aims to provide a hardware Trojan horse detection method based on machine learning, and aims to solve the problems of high algorithm complexity, long detection time, poor detection precision and the like caused by overhigh circuit characteristic dimension of the conventional machine learning method.

In order to achieve the purpose, the invention adopts the following technical scheme:

step S1: the hardware Trojan horse characteristics are analyzed from the aspects of circuit structures, Trojan horse trigger circuits and load circuit functions, the key attribute of low Trojan horse trigger probability is associated with the static circuit characteristics by combining the circuit characteristics learned by a traditional machine and the Trojan horse structures, and the circuit characteristics required by Trojan horse detection are provided.

Step S2: and preprocessing the gate-level netlist to be detected, and extracting features required by hardware Trojan horse detection from the gate-level netlist.

Step S3: and analyzing the contribution degree of the circuit characteristics to the discrimination of the Trojan horse network and the normal network by combining the random forest, the correlation matrix and the parallel coordinate graph, screening the characteristics, and selecting the optimal characteristic set.

Step S4: the data obtained in step S3 is subjected to normalization processing.

Step S5: and (4) performing dimensionality reduction on the data obtained in the step (S4) by adopting a characteristic dimensionality reduction method, reducing the computational complexity of an algorithm and simultaneously keeping most information of the data.

Step S6: and constructing a classifier based on unsupervised learning, training by using the data subjected to dimensionality reduction, and optimizing the model according to a training result to obtain an optimal training model.

Step S61: and selecting the model according to the data distribution and the characteristics of each algorithm model.

Step S62: and setting model parameters, and continuously adjusting the optimization model according to the training result.

Step S7: a plurality of data sets to be tested are divided into a training set and a testing set by adopting a cross validation method, so that the data sets can be utilized to the maximum extent, and incomplete prediction results caused by the fact that a classifier does not learn the characteristics of the testing set are prevented. The method of cross validation comprises the following steps: if N circuits to be tested are provided, one circuit to be tested is taken as a test set each time, the remaining N-1 circuits to be tested are taken as training sets, the process is repeated for N times, all the circuits to be tested are ensured to be trained and tested, and the data set is utilized to the maximum extent, so that the model can learn all information of the data.

Step S8: and inputting the test data into the trained model for detection, and calculating indexes such as TPR, TNR, Precision, Recall, F1-score, Accuracy and the like according to the detection result to evaluate the detection capability of the model.

Step S9: further, in step S1, the selected Trojan horse characteristics are: the number of logic gates x levels away from the input or output of the net (x has a value of 1, 2, 3, 4, 5); the number of sectors of a logic gate that is far from net x level (x has a value of 1, 2, 3, 4, 5); the number of flip-flops x levels away from the input or output of the net (x is 1, 2, 3, 4, 5); the number of multiplexers x levels away from the net input or output of the net (x has a value of 1, 2, 3, 4, 5); the logic stage number of the multi-path selector closest to the input end or the output end of the net of the line network; the logic level of the trigger closest to the input end or the output end of the net of the wire network; the logic series of the inverter closest to the input end or the output end of the net of the wire network; the input end or the output end of the net of the network contains the number of x stages of loops (the value of x is 1, 2, 3, 4 and 5); the logic level of the main input or main output nearest to the net of the net; the number of constant terms (the value of x is 1, 2, 3, 4, 5) which are x-level far away from the input end or the output end of the net; the maximum number of logic gates of the same type at x levels away from the net input of the net (x has a value of 1, 2, 3, 4, 5).

Step S10: further, in step S5, the dimensionality reduction is performed by using a PCA (principal component analysis) method, in which first, the mean value of each bit feature is subtracted, and then a covariance matrix is calculatedAnd calculating an eigenvalue and an eigenvector of the covariance matrix through SVD. Sorting the eigenvalues from large to small, selecting the largest k eigenvectors, and then respectively taking the corresponding k eigenvectors as column vectors to form an eigenvector matrix. The selection strategy of the k value is as follows: different k values are selected, then the following formula is used for continuous calculation, and the minimum k value which can meet the following formula conditions is selected.

Wherein t represents that the PCA algorithm retains the information of the original data (1-t). For example, when the value of t is 0.01, it represents that the PCA algorithm retains 99% of the main information of the original data.

Step S11: further, in step S6, the unsupervised learning model is selected as the Isolation Forest model, the dimensionality-reduced data obtained in step S5 is used for training, and the model is continuously optimized according to the training result.

S111: the contamination ratio was set to [0.01, 0.02, 0.05, 0.08, 0.1 ].

S112: whether the sample with put back is False is set, and whether the last trained classifier is inherited to carry out the next training of war _ start is set to True.

S113, setting the number of classifiers in the integrated model, namely the number n _ estimators of trees in the solitary forest to [120, 130, 140, 150, 160, 170, 180], setting the selected characteristic proportion max _ features in training each tree to [0.01, 0.02, 0.05, 0.08, 0.1], and setting the selected sample proportion max _ samples in training each tree to [0.01, 0.02, 0.05, 0.08, 0.1 ].

S114: the number of processes n _ jobs running simultaneously is set to 4.

S115: and selecting default values for other parameters.

And (3) selecting the parameters in the steps (1) and (3) by using grid search, and selecting the parameter with the optimal result as the parameter of the optimal model.

Step S12: further, in step S8, the TPR, TNR, Precision, Recall, F1-score and Accuracy indexes are calculated as follows:

TNR=TN/(TN+FP),TPR=TP/(TP+FN),Precision=TP/(TP+FP),Recall=TN/(TN+FP),F1-score=2*Precision*Recall/(Precision+Recall),Accuracy=(TP+TN)/total。

example 3

Referring to fig. 2, the hardware trojan detection system and the information data processing method based on unsupervised learning according to the embodiment of the present invention include the following steps:

step S1: the hardware Trojan horse characteristic is analyzed from the aspects of circuit structure, Trojan horse trigger circuit and load circuit function, the difference between the Trojan horse circuit and a normal circuit is analyzed, the characteristic of low trigger probability of the Trojan horse circuit is associated with the static characteristic, and the design can efficiently detect the circuit characteristic of the hardware Trojan horse circuit.

The selected Trojan horse features are as follows: the number of logic gates out _ logic _ gate _ x and in _ logic _ gate _ x (x is 1, 2, 3, 4, 5) that are x levels away from the input or output of net; the fan-in number fan _ in _ x of the logic gates far from the net x level (x has a value of 1, 2, 3, 4, 5); the number of flip-flops in _ dff _ x, out _ dff _ x that are x stages away from the input or output of the net (x has a value of 1, 2, 3, 4, 5); the number of multiplexers in _ mux _ x, out _ mux _ x that are x levels away from the input or output of the net (x is 1, 2, 3, 4, 5); the logic levels out _ nearest _ mux and in _ nearest _ mux of the multiplexer nearest to the input end or the output end of the net of the wire network; the logic levels out _ nearest _ dff and in _ nearest _ dff of the trigger closest to the input end or the output end of the net of the line network; the logic series in _ nearest _ inv and out _ nearest _ inv of the inverter closest to the input end or the output end of the net of the line network; the input end or the output end of the net contains the number in _ loop _ x and out _ loop _ x of x-level loops (the value of x is 1, 2, 3, 4 and 5); the logic levels, nearest _ pin and nearest _ pout, of the main input or main output nearest to the net of the wire network; the number of constant terms in _ const _ x, out _ const _ x (the value of x is 1, 2, 3, 4, 5) that are x-level far from the input or output of the net; the maximum number of logic gates of the same type in _ same _ gate _ x (x has a value of 1, 2, 3, 4, 5) at x levels away from the net input of the net.

Step S2: and preprocessing the gate-level netlist to be tested, performing text analysis by using a Python script, and extracting required circuit characteristics from the text analysis.

Step S3: because the circuit characteristics relate to more than sixty types, the characteristic dimensionality is very high, and the visualization cannot be realized by using a conventional means, then, a parallel coordinate graph is adopted to obtain the distribution conditions of a normal network and a Trojan horse network on different circuit characteristics, and the contribution degree of each characteristic to distinguishing the Trojan horse circuit from the normal circuit is visually seen; meanwhile, a random forest and correlation matrix method is adopted to obtain the determination value of the importance of each feature and the correlation degree between each feature; and analyzing by combining the results of the three methods, and screening the characteristics to obtain the optimal characteristic set. The random forest classification schematic diagram is shown in fig. 4, and the correlation matrix diagram is shown in fig. 5.

As shown in fig. 4, the partial features are sorted according to feature importance and then shown in table 1 (feature importance normalization).

TABLE 1 feature importance ranking (normalization)

Feature(s) Importance of Feature(s) Importance of
nearest_pout 0.043760 in_same_gate_3 0.014762
out_nearest_dff 0.042356 fan_in_1 0.014324
out_logic_gate_5 0.042214 in_mux_5 0.013926
out_logic_gate_4 0.038704 out_dff_3 0.013620
out_logic_gate_1 0.037122 out_dff_2 0.013165
out_logic_gate_3 0.033397 in_nearest_inv 0.012988
out_logic_gate_2 0.031972 in_dff_3 0.012708
out_dff_5 0.030106 in_same_gate_2 0.012640
in_logic_gate_5 0.027495 out_mux_1 0.012253
in_nearest_dff 0.027337 in_nearest_mux 0.011942
fan_in_5 0.026090 in_mux_4 0.011192
in_dff_5 0.025914 out_loop_5 0.010715
fan_in_4 0.025871 in_mux_3 0.010151
in_same_gate_5 0.025151 out_dff_1 0.009487
out_nearest_mux 0.024698 out_loop_3 0.009433
in_logic_gate_4 0.023731 out_const_0 0.007988
nearest_pin 0.023598 in_dff_2 0.007923
out_dff_4 0.022216 in_mux_2 0.006962
out_mux_5 0.022125 out_loop_2 0.004094
fan_in_3 0.022014 in_dff_1 0.004074
out_mux_4 0.019873 in_loop_4 0.004018
out_nearest_inv 0.018421 in_loop_5 0.003258
out_mux_2 0.017752 in_mux_1 0.002561
fan_in_2 0.017723 in_loop_3 0.002271
out_mux_3 0.017393 in_const_3 0.001182
in_logic_gate_3 0.017074 in_const_1 0.001066
in_dff_4 0.016860 in_const_2 0.000834
in_same_gate_4 0.016557 in_const_4 0.000671
out_loop_4 0.016339 in_const_5 0.000426
in_logic_gate_2 0.015102 in_logic_gate_1 0.000399

Step S4: and combining the feature importance in the table and the structural features of the Trojan horse circuit, integrating and screening the features, and selecting the optimal feature set. The selected optimal feature set includes:

(1) the logic level of the main output nearest to the net of the net;

(2) the logic level of the main input nearest to the net of the net;

(3) the logic level of the trigger closest to the net output end of the wire network;

(4) the logic stage number of the multi-path selector closest to the net output end of the wire network;

(5) the number of basic logic gates 5 levels away from the net output end of the net;

(6) the number of triggers which are 5 levels far away from the net output end of the wire network;

(7) the number of the multi-path selectors 5 levels away from the net output end of the wire network;

(8) the number of triggers which are 5 levels away from the net input end of the wire network;

(9) the logic level of the trigger closest to the net input end of the wire network;

(10) the output end of the net of the wire network contains the number of 4-level loops;

(11) the input end of the net of the wire network contains the number of 5-level loops;

(12) the number of logic gates of the same type is far away from the 5-level of the net input end of the wire network;

(13) the number of multiplexers 5 levels away from the net input of the net;

(14) the logic stage number of the inverter closest to the net output end of the wire net;

(15) the number of basic logic gates that are 5 levels away from the net input of the net.

Step S5: the feature data set obtained in step S4 is subjected to normalization processing.

The normalized formula is:where min is the minimum value of all sample data and max is the maximum value of all sample data.

Step S6: and (4) performing dimensionality reduction on the data set obtained in the step S5 by adopting a Principal Component Analysis (PCA) method, reducing the computational complexity of the algorithm and simultaneously keeping most information of the data.

The method comprises the following specific steps:

step S61: de-averaging (i.e., de-centering), i.e., subtracting the respective average value from each bit feature.

Step S62: computing a covariance matrix

Step S63: and calculating an eigenvalue and an eigenvector of the covariance matrix through SVD.

Step S64: sorting the eigenvalues from large to small, and selecting the largest k of the eigenvalues. Then, the corresponding k eigenvectors are respectively used as column vectors to form an eigenvector matrix.

The selection strategy of the k value is as follows: different k values are selected, then the following formula is used for continuous calculation, and the minimum k value which can meet the following formula conditions is selected.

Wherein t represents that the PCA algorithm retains the information of the original data (1-t). For example, when the value of t is 0.01, it represents that the PCA algorithm retains 99% of the main information of the original data.

Step S7: and (4) constructing an Isolation Forest unsupervised model, and training by using the dimension-reduced data obtained in the step S5 to obtain an optimal training model.

Step S71: setting the pollution rate to be 0.01, 0.02, 0.05, 0.08, 0.1;

step S72: setting whether a put-back sampling bootstrap is False, and setting whether to inherit the last training classifier to carry out the next training of war _ start to True;

step S73: setting the number of classifiers in the integrated model, namely the number n _ estimators of trees in the solitary forest as [120, 130, 140, 150, 160, 170, 180], setting the selected characteristic proportion max _ features during training each tree as [0.01, 0.02, 0.05, 0.08, 0.1], and setting the selected sample proportion max _ samples during training each tree as [0.01, 0.02, 0.05, 0.08, 0.1 ];

step S74: setting the number n _ jobs of processes running simultaneously to be 4;

step S75: and selecting default values for other parameters.

And (3) selecting the parameters in the steps (1) and (3) by using grid search, and selecting the parameter with the optimal result as the parameter of the optimal model. Wherein, the schematic diagram of the isolated forest algorithm is shown in fig. 6.

Step S8: the data set to be tested is divided into a training set and a testing set by adopting a cross validation method, so that the data set can be utilized to the maximum extent, and incomplete prediction results caused by the fact that a classifier does not learn the characteristics of the testing set are prevented.

The method of cross validation comprises the following steps: if N circuits to be tested are provided, one circuit to be tested is taken as a test set each time, the remaining N-1 circuits to be tested are taken as training sets, the process is repeated for N times, all the circuits to be tested are ensured to be trained and tested, and the data set is utilized to the maximum extent, so that the model can learn all information of the data.

The Trojan circuit of a test circuit RS232-T1200 of the invention is shown in FIG. 7, the Trojan circuit of a test circuit s15850-T100 of the invention is shown in FIG. 8, and the Trojan circuit of a test circuit s38417-T300 of the invention is shown in FIG. 9.

The circuit used in the training is a gate-level netlist on Trusthub, as shown in table 2.

TABLE 2 Circuit under test

Circuit name Number of normal networks Number of trojan networks
RS232-T1000 283 36
RS232-T1100 284 36
RS232-T1200 289 34
RS232-T1300 287 29
RS232-T1400 273 45
RS232-T1500 283 39
RS232-T1600 292 29
s15850-T100 2429 27
s35932-T100 6407 15
s35932-T200 6405 12
s35932-T300 6405 37
s38417-T100 5798 12
s38417-T200 5798 15
s38417-T300 5801 44
s38584-T100 7343 19
s38584-T200 7373 97
s38584-T300 7614 874

Step S9: and inputting the test data into the trained model for detection, calculating indexes such as TPR, TNR, Precision, Recall, F1-score and Accuracy according to the detection result, and drawing an ROC curve so as to evaluate the detection capability of the model.

The calculation formulas for TPR, TNR, Precision, Recall, F1-score and Accuracy are: TNR ═ TN/(TN + FP), TPR ═ TP/(TP + FN), Precision ═ TP/(TP + FP), Recall ═ TN/(TN + FP), F1-score ═ 2 ═ Precision/(Precision + Recall), access ═ (TP + TN)/total.

In the ROC curve, FPR is abscissa, TPR is ordinate, the area under the curve is AUC, the larger the AUC, the better, the ideal AUC value is 1.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

27页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于在存储器系统中保护数据的设备和方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类