Power grid CPS network attack identification method and system

文档序号:172560 发布日期:2021-10-29 浏览:17次 中文

阅读说明:本技术 一种电网cps网络攻击辨识方法及系统 (Power grid CPS network attack identification method and system ) 是由 罗伟峰 蒋屹新 于 2021-07-26 设计创作,主要内容包括:本发明提供一种电网CPS网络攻击辨识方法及系统,包括,步骤S1,从预先训练的电网量测数据里抽取多个样本并进行分类,将分类后的样本组成样本集;步骤S2,从所述样本集中选取多个事故场景类别,每个事故场景的类别中选取两组样本,获得样本训练集和样本测试集;步骤S3,将所述样本训练集输入预设的网络攻击辨识模型进行训练,获取网络攻击辨识模型参数;将所述样本测试集输入预设的测试模型进行训练,获取测试参数;将所述网络攻击辨识模型参数与所述测试参数比较,获取比较结果;步骤S4,当比较结果为一致时,通过预设的网络攻击辨识模型对网络攻击进行辨识,获取网络攻击辨识结果。本发明能够充分挖掘电网数据特性,有效提高辨识的精度与速度。(The invention provides a method and a system for identifying CPS network attack of a power grid, which comprises the following steps of S1, extracting a plurality of samples from pre-trained power grid measurement data, classifying the samples, and forming a sample set by the classified samples; step S2, selecting a plurality of accident scene categories from the sample set, selecting two groups of samples from each accident scene category, and obtaining a sample training set and a sample testing set; step S3, inputting the sample training set into a preset network attack identification model for training, and obtaining network attack identification model parameters; inputting the sample test set into a preset test model for training to obtain test parameters; comparing the network attack identification model parameters with the test parameters to obtain a comparison result; and step S4, when the comparison result is consistent, identifying the network attack through a preset network attack identification model, and acquiring a network attack identification result. The method can fully mine the data characteristics of the power grid, and effectively improve the identification precision and speed.)

1. A power grid CPS network attack identification method is characterized by comprising the following steps:

step S1, extracting a plurality of samples from pre-trained power grid measurement data, classifying the extracted samples according to accident scene categories, and forming a sample set by the classified samples;

step S2, selecting a plurality of accident scene categories from the sample set, selecting two groups of samples from each accident scene category, and respectively forming a sample training set and a sample testing set;

step S3, inputting the sample training set into a preset network attack identification model for training, and obtaining network attack identification model parameters; inputting the sample test set into a preset test model for training to obtain test parameters; comparing the network attack identification model parameters with the test parameters to obtain a comparison result; wherein the comparison result comprises a match or a mismatch;

and step S4, when the comparison result is consistent, identifying the network attack through a preset network attack identification model, and acquiring a network attack identification result.

2. The method of claim 1, wherein in step S1, the pre-trained grid metrology data process comprises:

step S101, acquiring a target data set to be trained, randomly sampling a plurality of accident scenes from the target data set, and selecting a plurality of real-time measurement data as a meta-task data set in the accident scenes;

step S102, randomly sampling a plurality of real-time measurement data from each type of real-time measurement data in the meta-task data set as a support set, and taking the rest real-time measurement data of the rest scenes as a query set;

step S103, randomly selecting one real-time measurement data from each type of accident scene in the support set, forming the selected real-time measurement data from all the accident scenes into a group of training data, inputting the training data into a preset training model for training, and obtaining a first training result;

step S104, extracting real-time measurement data from the query set, and judging the accident scene category to which the real-time measurement data belongs by using a preset training model to obtain a second training result;

step S105, calculating the accuracy of a preset training model according to the first training result and the second training result, repeating the step S101 to the step S105, updating the preset training model and a target data set according to the obtained plurality of accuracies, and obtaining an updated training model;

and step S106, acquiring the power grid measurement data in the historical record, inputting the updated training model for training, and acquiring pre-trained power grid measurement data.

3. The method as claimed in claim 2, wherein in step S3, the predetermined cyber attack recognition model comprises:

and scanning the time characteristics of the sample set by the input sample training set through a sliding window, and sequencing according to the time characteristics of the data in the sample set.

4. The method as claimed in claim 3, wherein in step S3, the predetermined cyber attack recognition model further comprises:

dividing the sequenced sample training set through a sliding window, and outputting the sample training set as a power grid time sequence feature vector; and the power grid time sequence characteristic vector is processed by each weak learner in the first layer of the cascade forest, the probability of network attack is output, and the results of each weak learner are spliced to form a vector and then output.

5. The method of claim 4, wherein in step S3, the predetermined cyber attack recognition model further comprises:

multiple base learners connected in parallel to increase the diversity of the algorithm; the base learner at least comprises a logistic regression decision tree, a classification and regression decision tree; and weak classifiers which are cascaded according to the weight of each classifier are arranged in the base learner.

6. The method according to claim 5, wherein in step S3, the obtaining the comparison result specifically includes:

comparing the difference value between the network attack identification model parameter and the test parameter with a preset threshold value, and generating a comparison result as inconsistency when the difference value between the network attack identification model parameter and the test parameter is greater than the preset threshold value; and when the difference value between the network attack identification model parameter and the test parameter is not more than a preset threshold value, generating a comparison result to be consistent.

7. A power grid CPS network attack recognition system for implementing the method as claimed in any one of claims 1-6, comprising:

the system comprises a sample set module, a data acquisition module and a data analysis module, wherein the sample set module is used for extracting a plurality of samples from pre-trained power grid measurement data, classifying the extracted samples according to accident scene categories, and forming the classified samples into a sample set; the system is also used for selecting a plurality of accident scene categories from the sample set, and two groups of samples are selected from the categories of each accident scene to respectively form a sample training set and a sample testing set;

the training module is used for inputting the sample training set into a preset network attack identification model for training to obtain network attack identification model parameters; inputting the sample test set into a preset test model for training to obtain test parameters; comparing the network attack identification model parameters with the test parameters to obtain a comparison result; wherein the comparison result comprises a match or a mismatch;

and the identification module is used for identifying the network attack through a preset network attack identification model when the comparison results are consistent, and acquiring a network attack identification result.

8. The system of claim 7, further comprising:

the power grid measurement data module is used for acquiring a target data set to be trained, randomly sampling a plurality of accident scenes from the target data set, and selecting a plurality of real-time measurement data as a meta-task data set in the accident scenes;

randomly sampling a plurality of real-time measurement data from each type of real-time measurement data in the meta-task data set as a support set, and taking the rest real-time measurement data of the rest scenes as a query set;

randomly selecting one real-time measurement data from each type of accident scene in the support set, forming the selected real-time measurement data from all the accident scenes into a group of training data, inputting the training data into a preset training model for training, and obtaining a first training result;

extracting real-time measurement data from the query set, and judging the accident scene category to which the real-time measurement data belongs by using a preset training model to obtain a second training result;

calculating the accuracy of a preset training model according to the first training result and the second training result, repeating iteration, updating the preset training model and a target data set according to the obtained multiple accuracies, and obtaining an updated training model;

and acquiring power grid measurement data in the historical record, inputting the updated training model for training, and acquiring pre-trained power grid measurement data.

9. The system of claim 8, wherein the training module is further configured to scan the input training set of samples over a sliding window for temporal features of the set of samples, and to rank the input training set of samples according to temporal features of data in the set of samples; dividing an input sample training set into a power grid time sequence characteristic vector through a sliding window; the power grid time sequence characteristic vector is processed by each weak learner in the first layer of the cascade forest, the probability of network attack is output, and the results of each weak learner are spliced to form a vector and then output; the training module also comprises a plurality of base learners connected in parallel and used for increasing the diversity of the algorithm; the base learner at least comprises a logistic regression decision tree, a classification and regression decision tree; and weak classifiers which are cascaded according to the weight of each classifier are arranged in the base learner.

10. The system of claim 9, wherein the training module is further configured to compare the difference between the cyber attack recognition model parameter and the test parameter with a preset threshold, and when the difference between the cyber attack recognition model parameter and the test parameter is greater than the preset threshold, generate a comparison result as inconsistent; and when the difference value between the network attack identification model parameter and the test parameter is not more than a preset threshold value, generating a comparison result to be consistent.

Technical Field

The invention relates to the technical field of power grid network attack identification, in particular to a method and a system for identifying a power grid CPS network attack.

Background

With the development of information systems and physical systems, power systems continue to present the features of power information physical systems. As an important carrier of system measurement, communication, calculation and control functions, the information side is a foundation stone for optimizing operation of a power grid, so that the information side receives wide attention of attackers, and a lot of security risks exist. Power systems, as an infrastructure for nationwide civilians, have become one of the primary targets for malicious organizations or hostile national attacks. Therefore, attention and vigilance must be paid to the serious impact of network attack and even "power war", and the research on the targeted security defense theory and method is urgently needed.

In an actual system, the number of large-scale attack cases for the system is small, and the probability of generating large-scale faults is extremely low, so that the network attack data of the power CPS (Cyber-Physical Systems) is highly unbalanced. Under the condition, when a machine learning algorithm is used for data mining, due to the fact that the difference between attack data and normal data in quantity is large, the classifier is not enough to pay attention to a few types of samples, effective features cannot be learned, and the identification requirement is difficult to meet.

Disclosure of Invention

The invention aims to provide a method and a system for identifying CPS network attacks of a power grid, and solves the technical problems that a classifier in the existing method is not enough to pay attention to a few types of samples, effective characteristics cannot be learned, and identification requirements are difficult to meet.

On one hand, the method for identifying the CPS network attack of the power grid comprises the following steps:

step S1, extracting a plurality of samples from pre-trained power grid measurement data, classifying the extracted samples according to accident scene categories, and forming a sample set by the classified samples;

step S2, selecting a plurality of accident scene categories from the sample set, selecting two groups of samples from each accident scene category, and respectively forming a sample training set and a sample testing set;

step S3, inputting the sample training set into a preset network attack identification model for training, and obtaining network attack identification model parameters; inputting the sample test set into a preset test model for training to obtain test parameters; comparing the network attack identification model parameters with the test parameters to obtain a comparison result; wherein the comparison result comprises a match or a mismatch;

and step S4, when the comparison result is consistent, identifying the network attack through a preset network attack identification model, and acquiring a network attack identification result.

Preferably, in step S1, the pre-trained process of grid measurement data includes:

step S101, acquiring a target data set to be trained, randomly sampling a plurality of accident scenes from the target data set, and selecting a plurality of real-time measurement data as a meta-task data set in the accident scenes;

step S102, randomly sampling a plurality of real-time measurement data from each type of real-time measurement data in the meta-task data set as a support set, and taking the rest real-time measurement data of the rest scenes as a query set;

step S103, randomly selecting one real-time measurement data from each type of accident scene in the support set, forming the selected real-time measurement data from all the accident scenes into a group of training data, inputting the training data into a preset training model for training, and obtaining a first training result;

step S104, extracting real-time measurement data from the query set, and judging the accident scene category to which the real-time measurement data belongs by using a preset training model to obtain a second training result;

step S105, calculating the accuracy of a preset training model according to the first training result and the second training result, repeating the step S101 to the step S105, updating the preset training model and a target data set according to the obtained plurality of accuracies, and obtaining an updated training model;

and step S106, acquiring the power grid measurement data in the historical record, inputting the updated training model for training, and acquiring pre-trained power grid measurement data.

Preferably, in step S3, the preset cyber attack recognition model includes:

and scanning the time characteristics of the sample set by the input sample training set through a sliding window, and sequencing according to the time characteristics of the data in the sample set.

Preferably, in step S3, the preset cyber attack recognition model further includes:

dividing the sequenced sample training set through a sliding window, and outputting the sample training set as a power grid time sequence feature vector; and the power grid time sequence characteristic vector is processed by each weak learner in the first layer of the cascade forest, the probability of network attack is output, and the results of each weak learner are spliced to form a vector and then output.

Preferably, in step S3, the preset cyber attack recognition model further includes:

multiple base learners connected in parallel to increase the diversity of the algorithm; the base learner at least comprises a logistic regression decision tree, a classification and regression decision tree; and weak classifiers which are cascaded according to the weight of each classifier are arranged in the base learner.

Preferably, in step S3, the obtaining the comparison result specifically includes:

comparing the difference value between the network attack identification model parameter and the test parameter with a preset threshold value, and generating a comparison result as inconsistency when the difference value between the network attack identification model parameter and the test parameter is greater than the preset threshold value; and when the difference value between the network attack identification model parameter and the test parameter is not more than a preset threshold value, generating a comparison result to be consistent.

On the other hand, a system for identifying the CPS network attack of the power grid is also provided, which is used for realizing the method for identifying the CPS network attack of the power grid, and comprises the following steps:

the system comprises a sample set module, a data acquisition module and a data analysis module, wherein the sample set module is used for extracting a plurality of samples from pre-trained power grid measurement data, classifying the extracted samples according to accident scene categories, and forming the classified samples into a sample set; the system is also used for selecting a plurality of accident scene categories from the sample set, and two groups of samples are selected from the categories of each accident scene to respectively form a sample training set and a sample testing set;

the training module is used for inputting the sample training set into a preset network attack identification model for training to obtain network attack identification model parameters; inputting the sample test set into a preset test model for training to obtain test parameters; comparing the network attack identification model parameters with the test parameters to obtain a comparison result; wherein the comparison result comprises a match or a mismatch;

and the identification module is used for identifying the network attack through a preset network attack identification model when the comparison results are consistent, and acquiring a network attack identification result.

Preferably, also envelope: the power grid measurement data module is used for acquiring a target data set to be trained, randomly sampling a plurality of accident scenes from the target data set, and selecting a plurality of real-time measurement data as a meta-task data set in the accident scenes;

randomly sampling a plurality of real-time measurement data from each type of real-time measurement data in the meta-task data set as a support set, and taking the rest real-time measurement data of the rest scenes as a query set;

randomly selecting one real-time measurement data from each type of accident scene in the support set, forming the selected real-time measurement data from all the accident scenes into a group of training data, inputting the training data into a preset training model for training, and obtaining a first training result;

extracting real-time measurement data from the query set, and judging the accident scene category to which the real-time measurement data belongs by using a preset training model to obtain a first training result;

calculating the accuracy of a preset training model according to the first training result and the second training result, repeating iteration, updating the preset training model and a target data set according to the obtained multiple accuracies, and obtaining an updated training model;

and acquiring power grid measurement data in the historical record, inputting the updated training model for training, and acquiring pre-trained power grid measurement data.

Preferably, the training module is further configured to scan the input sample training set through the sliding window for the time features of the sample set, and sort the input sample training set according to the time features of the data in the sample set; dividing an input sample training set into a power grid time sequence characteristic vector through a sliding window; the power grid time sequence characteristic vector is processed by each weak learner in the first layer of the cascade forest, the probability of network attack is output, and the results of each weak learner are spliced to form a vector and then output; the training module also comprises a plurality of base learners connected in parallel and used for increasing the diversity of the algorithm; the base learner at least comprises a logistic regression decision tree, a classification and regression decision tree; and weak classifiers which are cascaded according to the weight of each classifier are arranged in the base learner.

Preferably, the training module is further configured to compare a difference between the network attack identification model parameter and the test parameter with a preset threshold, and when the difference between the network attack identification model parameter and the test parameter is greater than the preset threshold, generate a comparison result that is inconsistent; and when the difference value between the network attack identification model parameter and the test parameter is not more than a preset threshold value, generating a comparison result to be consistent.

In summary, the embodiment of the invention has the following beneficial effects:

according to the method and the system for identifying the network attack of the power grid CPS, disclosed by the invention, the network attack identification method of the deep forest is improved through the fusion element learning, and compared with the traditional sampling method in the prior art, the N-way K-shot sampling method is more suitable for the problem of small samples, so that the over-fitting phenomenon is avoided; furthermore, the method is improved on the basis of a deep forest anomaly detection algorithm, a cascaded forest structure is expanded, the characteristics of the power grid data can be fully mined, and the identification precision and speed are effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is within the scope of the present invention for those skilled in the art to obtain other drawings based on the drawings without inventive exercise.

Fig. 1 is a main process diagram of a method for identifying a CPS network attack on a power grid according to an embodiment of the present invention.

Fig. 2 is a logic diagram of a method for identifying a CPS network attack in a power grid according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a power grid CPS network attack identification system in an embodiment of the present invention.

FIG. 4 is a diagram illustrating a network attack recognition model according to an embodiment of the present invention.

FIG. 5 is a diagram illustrating a network attack recognition model according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.

As shown in fig. 1 and fig. 2, schematic diagrams of an embodiment of a method for identifying a CPS network attack of a power grid according to the present invention are provided. In this embodiment, the method comprises the steps of:

step S1, extracting a plurality of samples from pre-trained power grid measurement data, classifying the extracted samples according to accident scene categories, and forming a sample set by the classified samples; it can be understood that based on an N-way K-shot method in meta-learning, each time a classification task is constructed, N types of data are extracted from a power grid measurement data training data set, each type of data is composed of K samples, and a data set of a small classification task is formed.

In a specific embodiment, the process of pre-training the grid measurement data includes:

step S101, acquiring a target data set to be trained, randomly sampling a plurality of accident scenes from the target data set, and selecting a plurality of real-time measurement data as a meta-task data set in the accident scenes; the brand new power grid measurement data to be identified are assumed to include N accident scenes, such as false data injection, remote tripping instruction attack, single-phase grounding short circuit, steady-state scenes and the like. The target task is to randomly extract K real-time measurement data from the N accident scenes for learning, and according to the learning result, the completely new data belonging to the N accident scenes can be correctly identified.

Step S102, randomly sampling a plurality of real-time measurement data from each type of real-time measurement data in the meta-task data set as a support set, and taking the rest real-time measurement data of the rest scenes as a query set; specifically, N accident scenes are randomly sampled from a data set, and K real-time measurement data are used as a meta-task data set in each accident scene.

Step S103, randomly selecting one real-time measurement data from each type of accident scene in the support set, forming the selected real-time measurement data from all the accident scenes into a group of training data, inputting the training data into a preset training model for training, and obtaining a first training result; specifically, M real-time measurement data are randomly sampled from each type of real-time measurement data in the meta-task data set to serve as a support set S, and N-M real-time measurement data in other scenes serve as a query set Q.

Step S104, extracting real-time measurement data from the query set, and judging the accident scene category to which the real-time measurement data belongs by using a preset training model to obtain a second training result; specifically, an example is randomly selected from each accident scene in the support set S, and the examples together form a group of training data which is input into the model for training.

Step S105, calculating the accuracy of a preset training model according to the first training result and the second training result, repeating the step S101 to the step S105, updating the preset training model and a target data set according to the obtained plurality of accuracies, and obtaining an updated training model; specifically, system fault instance data is extracted from the query set Q, the model is used for judging which kind of scene the data belongs to, the steps are repeated and iterated, the accuracy of the task model is finally calculated, actually, the accuracy is the loss of the model determined by the meta-learning parameters on the task, the loss is reversely transmitted to the meta-learning parameters through the loss gradient, and the meta-learning parameters are updated, namely the meta-learning process.

And step S106, acquiring the power grid measurement data in the historical record, inputting the updated training model for training, and acquiring pre-trained power grid measurement data.

Step S2, selecting a plurality of accident scene categories from the sample set, selecting two groups of samples from each accident scene category, and respectively forming a sample training set and a sample testing set; it can be understood that N accident scenes are selected from the sample set in a gathering mode, K + M real-time measurement data are randomly extracted from each accident scene for learning, the K real-time measurement data serve as a training set, and the M real-time measurement data serve as a testing set.

Step S3, inputting the sample training set into a preset network attack identification model for training, and obtaining network attack identification model parameters; inputting the sample test set into a preset test model for training to obtain test parameters; comparing the network attack identification model parameters with the test parameters to obtain a comparison result; wherein the comparison result comprises a match or a mismatch; it can be understood that a random forest and a Lightgbm-based learner are introduced to improve the richness of the model on the basis of the original Bagging learner; and comparing the identification precision under the single learner with the identification precision under the combined learner, and searching for an optimal combination mode to construct an optimal base learner composition form for improving the cascade structure in the deep forest.

In an embodiment, as shown in fig. 4, the preset network attack recognition model includes: multiple base learners connected in parallel to increase the diversity of the algorithm; the base learner at least comprises a logistic regression decision tree, a classification and regression decision tree; and weak classifiers which are cascaded according to the weight of each classifier are arranged in the base learner. As shown in fig. 5, in the data processing process, the input sample training set scans the time characteristics of the sample set through the sliding window, and is sorted according to the time characteristics of the data in the sample set; it can be understood that different sliding windows are used for scanning the original power grid measurement data characteristics, the characteristics are extracted through the sliding windows, the connection between the front and the back of a power grid measurement data sequence is obtained, and the characteristic obtaining capability is improved. In the power grid sample training, the size of a sliding window is determined through experiments, and the initial value of the sliding window is one fourth or one eighth of the characteristic dimension of the power grid measurement data. Dividing and outputting the sequenced sample training set into a power grid time sequence characteristic vector through a sliding window; and the power grid time sequence characteristic vector is processed by each weak learner in the first layer of the cascade forest, the probability of network attack is output, and the results of each weak learner are spliced to form a vector and then output. It can be understood that, based on the learner of the Boosting idea, the first-layer input data of the cascade forest is a power grid time sequence feature vector divided by a sliding window, the power grid time sequence feature vector outputs the probability of network attack after passing through each weak learner in the first layer, and the results of each weak learner are spliced to form a vector and then input into the next-layer cascade structure. On the basis of the original Bagging learning device, various base learning devices such as logistic regression, classification and regression decision tree are introduced to increase the diversity of the algorithm. In the Boosting algorithm, Adaboost, XGboost and Lightgbm are introduced, and weak classifiers are cascaded by taking the weight of each classifier into consideration by the Adaboost.

Specifically, the difference between the network attack identification model parameter and the test parameter is compared with a preset threshold value, and when the difference between the network attack identification model parameter and the test parameter is greater than the preset threshold value, a comparison result is generated to be inconsistent; and when the difference value between the network attack identification model parameter and the test parameter is not more than a preset threshold value, generating a comparison result to be consistent. It can be understood that the cascade structure in the original deep forest is improved, the identification precision under a single learner in the cascade structure is compared with the identification precision under a combined learner, an optimal combination mode is searched, the model identification precision under the single learner is firstly compared, and the type of the better learner is analyzed through results; and secondly, randomly combining the learners, and comparing the model identification accuracy under the combined learner, wherein the measurement indexes of the identification accuracy are accuracy rate, F1 score and AUC.

And step S4, when the comparison result is consistent, identifying the network attack through a preset network attack identification model, and acquiring a network attack identification result. It can be understood that when the identification precision is determined to be consistent with the original model, the improved model, namely the network attack identification model is adopted to identify subsequent network attacks, so that the efficiency can be ensured, meanwhile, the attention degree on a few types of samples is increased, the learning effective characteristics are increased, and the identification requirement is met.

In the embodiment, the method is applied to the cyber-physical combined simulation platform, the relevant transient fault data is obtained from the power cyber-physical combined simulation platform, and automatic simulation is performed under random load level, fault position and attack threat. The test scenarios used in the examples are divided into three categories: 1) a network attack scenario; 2) single-phase grounding short circuit of the power system; 3) normal operating conditions. A total of 24 test scenarios were selected:

the protection scheme of the single-phase grounding short-circuit scene (Q1-Q3) is the two-section protection, and meanwhile, whether the fault disappears is observed by automatic reclosing after the fault occurs for one circle.

The power transmission line maintenance scenes (Q4-Q6) simulate a scene with a maintenance plan for a certain line, so that an operator can remotely send a trip command in advance to trip off circuit breakers at two ends of the line. At this time, in the control log on the information side, a corresponding record and time are left.

For a network attack scenario, two types of attacks are involved: 1) relay trip command injection; 2) and modifying the action threshold of the relay. Trip command injection attacks are further divided into two categories: 1) Q7-Q12: and attacking a relay to simulate the scene of misoperation of the relay protection device. 2) Q13-Q15: and the relays at the two ends of the same line are attacked simultaneously, so that a transmission line maintenance scene is simulated.

False data injection attacks (Q16-Q19) steal the authority of internal personnel, send commands to access the internal register of the relay, and modify the setting of the related relay, so that when a real fault occurs and the relay protection is required to operate correctly, the protection device refuses to operate, and a greater result is caused. The simulation of the scene is a mechanical fault scene of the relay protection device in natural fault (Q20-Q23).

In a steady state scenario (Q24), the load will vary randomly within a certain range (80% -120%), but attack events, interference operations and control operations will not occur.

Each scene takes the system reaching and re-reaching the steady state as the beginning and the end of one simulation in the simulation process. The system collects the synchronous measurement data 200 times per second, the attack time is set to be slightly shorter than the sampling interval of 0.005s, and otherwise, the system can directly judge according to the fault data. This chapter co-simulation resulted in 916 sets of simulation samples, each sample including 39-dimensional two-sided data and a timestamp. The 39-dimensional data includes 9-dimensional physical side measurements, 6-dimensional relay status, and 24-dimensional event logs including relay actions and modification logs for each relay, and service plans, respectively. The physical side synchrophasor data includes discrete phase voltages and phase currents. The log of each event includes data for about 4000 moments, corresponding to a simulation time of about 20 seconds. When N-way K-shot is used for sampling, the optimal identification effect is obtained when N is 20 and K is 21. When the cascade structure is a single learner, the accuracy of the classification and regression decision tree model is 76%, the F1 score is 0.71, and the AUC is 0.74, all of which are better than the accuracy of the logistic regression model of 71%, the F1 score of 0.60, and the AUC value of 0.68. In the cascade structure, when the unified model is a tree model, the overall performance is superior to that of a non-tree model. Compared with a tree model, a non-tree model such as logistic regression is sensitive to data and is seriously influenced by abnormal points, so that the model prediction result is poorer, and the robustness is lower than that of the tree model. (b) The learner based on the Bagging idea is a random forest, the accuracy of the base learner is 85%, the F1 score is 0.80, and the AUC value is 0.82. Boosting-based learners are Adaboost, XGBoost, and Lightgbm. From the analysis of results, compared with Adaboost, the recognition result of the random forest model as the base learner is improved by 4% in accuracy, improved by 0.02 in F1 score and improved by 0.02 in AUC value, but is inferior to XGboost and Lightgbm in whole. The possible reasons are that the number of Adaboost classifiers is difficult to set, and the data imbalance can cause the classification precision to be reduced. (c) Among Boosting algorithms, XGboost and Lightgbm show better levels. The AUC values of the XGboost model and the Lightgbm model are the same as 0.90, the Lightgbm obtains the optimal result in two measurement indexes of accuracy and F1 fraction, and the overall result is slightly superior to the XGboost.

When the cascade structure is a combined learner, (a) the overall recognition accuracy of the combined learner is generally superior to that of the single learner. The diversity of the model can enhance the classification capability of the cascade structure, because in the training stage, each base learner outputs the class label to which the data belongs on the basis of the model characteristic of the base learner in the sample space, the respective advantages of different classifiers are fully exerted, a plurality of characteristic information of the sample is comprehensively learned, and the structural difference of the base learners has an addition effect on the whole cascade structure when the characteristic information difference is processed, so that the complementation is formed. (b) The accuracy of the CART and RF combination mode is 0.77 at the lowest, the F1 score is 0.73 at the lowest, and the AUC value is 0.75 at the lowest, which shows that when two base learners are both ensemble learning models, the prediction result is better than the effect of other combinations. The accuracy, F1 score and AUC value of Adaboost, XGboost combination, Adaboost, Lightgbm combination and XGboost, Lightgbm combination are generally due to other combination modes. Wherein the optimal accuracy rate is 0.92, the optimal F1 score is 0.89, and the optimal AUC value is 0.91. It is shown that when the model consists of Bagging and Boosting, the effect is better than the combination of two Boosting algorithms. Therefore, the combination of the random forest model and the Lightgbm model is finally selected to improve the deep forest model, and the average accuracy index is 90.8%, the average recall index is 89.0%, and the average AUC score index is 0.912.

The result shows that the method for fusing meta-learning and improving the deep forest can effectively solve the problem of small samples and prevent the model from being over-fitted, and compared with the original network attack identification method, the method can effectively improve the network attack identification precision and speed and meet the actual engineering requirements.

Fig. 3 is a schematic diagram of an embodiment of a power grid CPS network attack identification system provided by the present invention. In this embodiment, the system is used to implement the method for identifying CPS network attacks on the power grid, and includes:

the system comprises a sample set module, a data acquisition module and a data analysis module, wherein the sample set module is used for extracting a plurality of samples from pre-trained power grid measurement data, classifying the extracted samples according to accident scene categories, and forming the classified samples into a sample set; the system is also used for selecting a plurality of accident scene categories from the sample set, and two groups of samples are selected from the categories of each accident scene to respectively form a sample training set and a sample testing set;

the training module is used for inputting the sample training set into a preset network attack identification model for training to obtain network attack identification model parameters; inputting the sample test set into a preset test model for training to obtain test parameters; comparing the network attack identification model parameters with the test parameters to obtain a comparison result; wherein the comparison result comprises a match or a mismatch;

the identification module is used for identifying the network attack through a preset network attack identification model when the comparison results are consistent, and acquiring a network attack identification result;

the power grid measurement data module is used for acquiring a target data set to be trained, randomly sampling a plurality of accident scenes from the target data set, and selecting a plurality of real-time measurement data as a meta-task data set in the accident scenes; randomly sampling a plurality of real-time measurement data from each type of real-time measurement data in the meta-task data set as a support set, and taking the rest real-time measurement data of the rest scenes as a query set; real-time measurement data, namely forming a group of training data by using the selected real-time measurement data in all accident scenes, inputting the training data into a preset training model for training to obtain a first training result; extracting real-time measurement data from the query set, and judging the accident scene category to which the real-time measurement data belongs by using a preset training model to obtain a second training result; calculating the accuracy of a preset training model according to the first training result and the second training result, repeating iteration, updating the preset training model and a target data set according to the obtained multiple accuracies, and obtaining an updated training model; and acquiring power grid measurement data in the historical record, inputting the updated training model for training, and acquiring pre-trained power grid measurement data.

In a specific embodiment, the training module is further configured to scan an input sample training set through a sliding window for time features of the sample set, extract the time features by using the sliding window, and sort according to the time features of data in the sample set; dividing an input sample training set into a power grid time sequence characteristic vector through a sliding window; the power grid time sequence characteristic vector is processed by each weak learner in the first layer of the cascade forest, the probability of network attack is output, and the results of each weak learner are spliced to form a vector and then output; the training module also comprises a plurality of base learners connected in parallel and used for increasing the diversity of the algorithm; the base learner at least comprises a logistic regression decision tree, a classification and regression decision tree; and weak classifiers which are cascaded according to the weight of each classifier are arranged in the base learner.

More specifically, the training module is further configured to compare a difference between the network attack identification model parameter and the test parameter with a preset threshold, and when the difference between the network attack identification model parameter and the test parameter is greater than the preset threshold, generate a comparison result that is inconsistent; and when the difference value between the network attack identification model parameter and the test parameter is not more than a preset threshold value, generating a comparison result to be consistent.

Regarding the implementation process of the power grid CPS network attack identification system, reference may be made to the process of the power grid CPS network attack identification method, which is not described herein again.

In summary, the embodiment of the invention has the following beneficial effects:

according to the power grid CPS network attack identification method, the network attack identification method of the deep forest is improved through fusion element learning, compared with the traditional sampling method in the prior art, the N-way K-shot sampling method is more suitable for the problem of small samples, and the over-fitting phenomenon is avoided; furthermore, the method is improved on the basis of a deep forest anomaly detection algorithm, a cascaded forest structure is expanded, the characteristics of the power grid data can be fully mined, and the identification precision and speed are effectively improved.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种处理信息的方法、装置和系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类