Database abnormal access detection method, system and equipment based on distance measurement

文档序号:1904846 发布日期:2021-11-30 浏览:7次 中文

阅读说明:本技术 基于距离度量的数据库异常访问检测方法、系统及设备 (Database abnormal access detection method, system and equipment based on distance measurement ) 是由 宋美艳 陈锋 沈正华 郑卫东 李晓燕 周波 贾泽冰 刘畅 李亚都 于 2021-11-02 设计创作,主要内容包括:本发明提供了基于距离度量的数据库异常访问检测方法、系统及设备,通过基于距离的KNN算法对低维用户访问向量进行训练,得到异常检测模型,并构建异常访问响应器,本方法通过基于距离的KNN算法构建异常检测模型,可以实时检测用户的数据库操作行为是否正常,有效解决了使用数据库审计工具无法实时检测正在进行的数据库操作是否正常;异常访问响应器事先制订一些对数据库异常操作的响应策略,从而实现对异常检测模型预测的用户异常操作做出响应,并记录该用户异常操作的相关信息,以此实现对异常操作的主动防御效果。基于以上特点,本方法实现对用户异常操作数据库的实时监控和主动防御的效果。(The invention provides a distance measurement-based database abnormal access detection method, a distance measurement-based database abnormal access detection system and distance measurement-based database abnormal access detection equipment, wherein a distance-based KNN algorithm is used for training a low-dimensional user access vector to obtain an abnormal detection model, and an abnormal access responder is constructed; the abnormal access responder prepares response strategies to abnormal operations of the database in advance, thereby realizing the response to the abnormal operations of the user predicted by the abnormal detection model and recording the related information of the abnormal operations of the user, thereby realizing the active defense effect to the abnormal operations. Based on the characteristics, the method realizes the effects of real-time monitoring and active defense on the abnormal operation database of the user.)

1. A database abnormal access detection method based on distance measurement is characterized in that: the method comprises the following steps:

extracting database access information;

the extracted database access information enters a model training stage to obtain a low-dimensional user access vector in a data dimension reduction mode;

training a low-dimensional user access vector by using a distance-based algorithm to obtain an anomaly detection model;

and the training result obtained in the model training stage is used as an anomaly detection model in the model testing stage, the low-dimensional user access vector is obtained in the model testing stage in a data dimension reduction mode, and the user access vector subjected to dimension reduction is input into the anomaly detection model to obtain a detection result, so that the real-time monitoring and active defense for the database anomaly access are realized.

2. The method for detecting abnormal database access based on distance measurement as claimed in claim 1, wherein: in the model training stage and the model testing stage, data dimensionality reduction is realized by adopting an LDA algorithm, a low-dimensional user access vector is constructed, and detection of abnormal access of the database is realized by adopting a KNN model as an abnormal detection model.

3. The method for detecting abnormal database access based on distance measurement as claimed in claim 2, wherein: the distance measurement of the KNN model adopts Euclidean distance, and the specific expression is as follows:

where x represents a sample point and y represents the classification of the sample point correspondence.

4. The method for detecting abnormal database access based on distance measurement as claimed in claim 2, wherein: and the KNN model can be used for adjusting the K value through the calculation result to divide normal access and abnormal access to different degrees.

5. The method for detecting abnormal database access based on distance measurement as claimed in claim 2, wherein: the specific steps of the model training phase include the following:

performing data preprocessing operation on the extracted database history log to obtain text data;

extracting user operation characteristics from the text data, and constructing an initial database user access characteristic portrait based on the user attribute characteristics and the user operation characteristics, wherein the initial database user access characteristic portrait is a high-dimensional matrix;

performing dimensionality reduction operation on a high-dimensional matrix of an initial database user access characteristic image through an LDA algorithm to obtain a low-dimensional user access vector;

and taking the low-dimensional user access vector as the input of the KNN model, calculating the parameters to be trained in the KNN model, and continuously adjusting the given K value to obtain the optimal classification result, namely the model training result.

6. The method for detecting abnormal database access based on distance measurement as claimed in claim 5, wherein: the data preprocessing operation is to remove the system log to obtain text data.

7. The method for detecting abnormal database access based on distance measurement as claimed in claim 2, wherein: the specific steps of the model testing stage comprise the following steps:

carrying out data preprocessing on user data in a model training stage, and extracting effective access data statements;

constructing a database user access characteristic portrait on the basis of the user attribute characteristics and the user operation characteristics for the effective access data sentences to obtain a high-dimensional database user access characteristic portrait;

performing dimensionality reduction operation on the high-dimensional database user access characteristic image through an LDA algorithm to obtain a low-dimensional user access vector;

taking a low-dimensional user access vector as the input of a KNN model, calculating the KNN model to obtain a parameter to be trained in the model, continuously adjusting a given K value to obtain an optimal detection result, and distinguishing a normal access detection result from an abnormal access detection result;

and inputting the abnormal access detection result into an abnormal access responder, outputting different abnormal access levels, and executing different operations on the access.

8. The method according to claim 7, wherein the distance metric-based database abnormal access detection method comprises: the data preprocessing is to delete the invalid statements accessed by the database and extract the valid access statements of the core.

9. A distance metric based database abnormal access detection system, comprising:

the acquisition module is used for extracting database access information;

the first processing module is used for obtaining a low-dimensional user access vector in a data dimension reduction mode when the extracted database access information enters a model training stage;

the second processing module is used for training the low-dimensional user access vector by using a distance-based algorithm to obtain an anomaly detection model;

and the third processing module is used for taking a training result obtained in the model training stage as an abnormal detection model in the model testing stage, obtaining a low-dimensional user access vector in a data dimension reduction mode in the model testing stage, inputting the user access vector subjected to dimension reduction into the abnormal detection model to obtain a detection result, and realizing real-time monitoring and active defense on abnormal access of the database.

10. A distance-metric-based database abnormal access detection apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the steps of the distance-metric-based database abnormal access detection method according to any one of claims 1 to 8.

Technical Field

The invention belongs to the processing of DCS transmission data, mainly relates to the field of database abnormal access detection, and particularly relates to a method, a system and equipment for detecting database abnormal access based on distance measurement.

Background

In the upper computer part of a Distributed Control System (DCS), each upper computer subsystem often performs operations such as access, search and the like on a database. In addition, database management personnel also frequently modify and maintain the database. Malicious access and operation to the database are generally divided into external malicious access and internal malicious access. And a better defense strategy is usually provided for external malicious access of the database, and malicious access or misoperation of internal personnel with authority is usually difficult to prevent. In the electrical digital data processing, malicious access or misoperation of internal personnel is generally realized by adopting a database auditing tool in the prior art, and the database auditing tool records all database access and operation records including operation IP, users, operation statements, time, operation results and the like in the background; the system security officer then detects the operation behavior of the internal user by analyzing the database access and operation records. Because the database auditing tool is a post-investigation means and has no way to prevent the abnormal operation of the ongoing database in real time, the method cannot play the roles of active defense and real-time defense.

Disclosure of Invention

The invention provides a distance measurement-based database abnormal access detection method, system and device, aiming at the problem that a database auditing tool in the prior art has no way to prevent abnormal database operation of an internal user in real time, the invention constructs a low-dimensional user access vector through an LDA algorithm, inputs the user access vector into a distance-based KNN model for training to obtain an abnormal detection model, and the abnormal detection model is used for detecting whether all operations of the user on the database are normal or not and simultaneously constructs an abnormal access response strategy, thereby performing real-time monitoring and active defense effect on the abnormal access of the internal user.

The invention is realized by the following technical scheme:

a database abnormal access detection method based on distance measurement comprises the following steps:

extracting database access information;

the extracted database access information enters a model training stage to obtain a low-dimensional user access vector in a data dimension reduction mode;

training a low-dimensional user access vector by using a distance-based algorithm to obtain an anomaly detection model;

and the training result obtained in the model training stage is used as an anomaly detection model in the model testing stage, the low-dimensional user access vector is obtained in the model testing stage in a data dimension reduction mode, and the user access vector subjected to dimension reduction is input into the anomaly detection model to obtain a detection result, so that the real-time monitoring and active defense for the database anomaly access are realized.

Preferably, the data dimensionality reduction is realized by adopting an LDA algorithm in the model training stage and the model testing stage, a low-dimensional user access vector is constructed, and the detection of the abnormal access of the database is realized by adopting a KNN model as an abnormal detection model.

Further, the distance measurement of the KNN model adopts an euclidean distance, and a specific expression is as follows:

where x represents a sample point and y represents the classification of the sample point correspondence.

Further, the KNN model can be used for adjusting the K value through the calculation result to divide normal access and abnormal access to different degrees.

Further, the specific steps of the model training phase include the following:

performing data preprocessing operation on the extracted database history log to obtain text data;

extracting user operation characteristics from the text data, and constructing an initial database user access characteristic portrait based on the user attribute characteristics and the user operation characteristics, wherein the initial database user access characteristic portrait is a high-dimensional matrix;

performing dimensionality reduction operation on a high-dimensional matrix of an initial database user access characteristic image through an LDA algorithm to obtain a low-dimensional user access vector;

and taking the low-dimensional user access vector as the input of the KNN model, calculating the parameters to be trained in the KNN model, and continuously adjusting the given K value to obtain the optimal classification result, namely the model training result.

Furthermore, the data preprocessing operation is to remove the system log to obtain the text data.

Further, the specific steps of the model testing stage comprise the following steps:

carrying out data preprocessing on user data in a model training stage, and extracting effective access data statements;

constructing a database user access characteristic portrait on the basis of the user attribute characteristics and the user operation characteristics for the effective access data sentences to obtain a high-dimensional database user access characteristic portrait;

performing dimensionality reduction operation on the high-dimensional database user access characteristic image through an LDA algorithm to obtain a low-dimensional user access vector;

taking a low-dimensional user access vector as the input of a KNN model, calculating the KNN model to obtain a parameter to be trained in the model, continuously adjusting a given K value to obtain an optimal detection result, and distinguishing a normal access detection result from an abnormal access detection result;

and inputting the abnormal access detection result into an abnormal access responder, outputting different abnormal access levels, and executing different operations on the access.

Furthermore, the data preprocessing is to delete the invalid statements of the database access and extract the valid access statements of the core.

A distance measurement based database abnormal access detection system comprises

The acquisition module is used for extracting database access information;

the first processing module is used for obtaining a low-dimensional user access vector in a data dimension reduction mode when the extracted database access information enters a model training stage;

the second processing module is used for training the low-dimensional user access vector by using a distance-based algorithm to obtain an anomaly detection model;

and the third processing module is used for taking a training result obtained in the model training stage as an abnormal detection model in the model testing stage, obtaining a low-dimensional user access vector in a data dimension reduction mode in the model testing stage, inputting the user access vector subjected to dimension reduction into the abnormal detection model to obtain a detection result, and realizing real-time monitoring and active defense on abnormal access of the database.

A distance-metric-based database abnormal access detection apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the distance-metric-based database abnormal access detection method as described above when executing the computer program.

Compared with the prior art, the invention has the following beneficial technical effects:

the invention provides a database abnormal access detection method based on distance measurement, which is characterized in that a distance-based KNN algorithm is used for training a low-dimensional user access vector to obtain an abnormal detection model and constructing an abnormal access responder, the method constructs the abnormal detection model through the distance-based KNN algorithm, can detect whether the database operation behavior of a user is normal in real time, and effectively solves the problem that whether the ongoing database operation is normal or not can not be detected in real time by using a database auditing tool; the abnormal access responder prepares response strategies to abnormal operations of the database in advance, thereby realizing the response to the abnormal operations of the user predicted by the abnormal detection model and recording the related information of the abnormal operations of the user, thereby realizing the active defense effect to the abnormal operations. Based on the characteristics, the method realizes the effects of real-time monitoring and active defense on the abnormal operation database of the user.

Drawings

FIG. 1 is a flow chart illustrating the steps of a method for detecting abnormal database access based on distance measurement according to the present invention;

FIG. 2 is a schematic diagram of a distance metric-based database abnormal access detection system according to the present invention.

Detailed Description

The present invention will now be described in further detail with reference to specific examples, which are intended to be illustrative, but not limiting, of the invention.

The invention provides a database abnormal access detection method based on distance measurement, which comprises the following steps as shown in figure 1:

extracting database access information;

the extracted database access information enters a model training stage to obtain a low-dimensional user access vector in a data dimension reduction mode;

training a low-dimensional user access vector by using a distance-based algorithm to obtain an anomaly detection model;

and the training result obtained in the model training stage is used as an anomaly detection model in the model testing stage, the low-dimensional user access vector is obtained in the model testing stage in a data dimension reduction mode, and the user access vector subjected to dimension reduction is input into the anomaly detection model to obtain a detection result, so that the real-time monitoring and active defense for the database anomaly access are realized.

In the model training stage and the model testing stage, data dimensionality reduction is realized by adopting an LDA algorithm, a low-dimensional user access vector is constructed, and detection of abnormal access of the database is realized by adopting a KNN model as an abnormal detection model. In the model training stage and the model testing stage, the database user access characteristic images obtained based on the user attribute characteristics and the user operation characteristics are high-dimensional matrixes.

And the user access vectors obtained by the LDA algorithm in the model training stage and the model testing stage are low-dimensional vectors.

Referring to fig. 1, the specific steps of the model training phase in the present invention include the following:

performing data preprocessing operation on the extracted database history log to obtain text data;

extracting user operation characteristics from the text data, and constructing an initial database user access characteristic portrait based on the user attribute characteristics and the user operation characteristics, wherein the initial database user access characteristic portrait is a high-dimensional matrix;

performing dimensionality reduction operation on a high-dimensional matrix of an initial database user access characteristic image through an LDA algorithm to obtain a low-dimensional user access vector;

and taking the low-dimensional user access vector as the input of the KNN model, calculating the parameters to be trained in the KNN model, and continuously adjusting the given K value to obtain the optimal classification result of the model.

The data preprocessing operation is to remove the system log to obtain text data.

Referring to fig. 1, the specific steps of the model testing phase of the present invention include the following:

carrying out data preprocessing on user data in a model training stage, and extracting effective access data statements;

constructing a database user access characteristic portrait on the basis of the user attribute characteristics and the user operation characteristics for the effective access data sentences to obtain a high-dimensional database user access characteristic portrait;

performing dimensionality reduction operation on the high-dimensional database user access characteristic image through an LDA algorithm to obtain a low-dimensional user access vector;

taking a low-dimensional user access vector as the input of a KNN model, calculating the KNN model to obtain a parameter to be trained in the model, continuously adjusting a given K value to obtain an optimal detection result, and distinguishing a normal access detection result from an abnormal access detection result;

and inputting the abnormal access detection result into an abnormal access responder, outputting different abnormal access levels, and executing different operations on the access.

The data preprocessing is to delete the invalid statements accessed by the database and extract the valid access statements of the core.

In the distance measurement-based database abnormal access detection method, the LDA algorithm is adopted to realize data dimension reduction in both the model training stage and the model testing stage, and the specific mode of the LDA algorithm is as follows:

set data set D = { (x)1,y1),(x2,y2),...,(xm,ym) In which arbitrary samples xiAre all n-dimensional vectors; y isiAs a class of sample, yi∈{0,1};

Definition of Nj(j =0, 1) represents the number of j-th class samples, Xj(j =0, 1) is the set of class j samples, and μj(j =0, 1) is the mean vector of the j-th class samples, defining Σj(j =0, 1) is the covariance matrix of the jth class sample.

Therefore, ujThe expression of (a) is:

Σjthe expression of (a) is:

the LDA algorithm is to project the same kind of data onto a straight line because the projection points of the same kind of data are as close as possible, and the distances between the class centers of the different kinds of data are as large as possible.

If the projection straight line is set as the vector w, then for any oneA sample xiIts projection on the straight line w is wTxi,wTRepresenting the transposition of the vector w, setting the distribution of the central points of the two classes as mu0,μ1(ii) a The projection on the straight line w is wTμ0And wTμ1

Since the LDA algorithm needs to make the distance between the class centers of the different classes of data as large as possible, i.e. to maximize(ii) a Meanwhile, the projection points of the same type of data are as close as possible, that is, the covariance w of the projection points of the same type of sample is requiredTΣ0w and wTΣ1w is as small as possible, i.e. w is minimizedTΣ0w+wTΣ1w。

In summary, the optimization goals are:

j (W) represents an objective function; when J (W) obtains the maximum value, the obtained result is the user access characteristic matrix after LDA dimension reduction.

The KNN model is a distance-based machine learning method, and the principle of the KNN model can be understood as a majority decision method, namely K samples which are closest to the characteristics of the prediction samples in the training set are the category with the largest category number in the data set, the K sample data which are closest to the prediction samples in the KNN model are normal access data, and the K samples which are far away from the characteristics of the prediction samples are abnormal database access.

The distance measurement of the KNN model adopts the Euclidean distance, and the specific expression is as follows:

where x represents a sample point and y represents the classification of the sample point correspondence.

After defining the distance and the K value, any new sample is classified as the class with the highest class among the K samples closest to the sample.

Taking two-dimensional point matrix classification problems as an example; when the sample is S = (x)1,y1),(x2,y2),...,(xN,yN) (ii) a Wherein xi is a point on the two-dimensional plane, and yi is a classification corresponding to the point on the two-dimensional plane of the sample xi. For a new sample x, the formula for the class y corresponding to the sample point is as follows:

wherein, cjRepresenting a category of the sample; n is a radical ofk(x) Represents the set of k samples nearest to sample x, f being an indicator function for yi; the mathematical expression of the indicator function is as follows:

examples

When a user performs an operation of deleting the whole data table from the database, "delete from table _ name"; the low-dimensional user access vector is input into the trained anomaly detection model, which outputs a predicted result — assuming "abnormal operation". And inputting the prediction result into an abnormal access responder, and outputting a deletion rejection instruction by the abnormal access responder, and simultaneously rejecting the deletion operation of the user on the data table.

The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention.

As shown in fig. 2, an embodiment of the present invention provides a distance-metric-based database abnormal access detection system, which is used to implement the distance-metric-based database abnormal access detection method described above, where the database abnormal access detection system includes:

the acquisition module is used for extracting database access information;

the first processing module is used for obtaining a low-dimensional user access vector in a data dimension reduction mode when the extracted database access information enters a model training stage;

the second processing module is used for training the low-dimensional user access vector by using a distance-based algorithm to obtain an anomaly detection model;

and the third processing module is used for taking a training result obtained in the model training stage as an abnormal detection model in the model testing stage, obtaining a low-dimensional user access vector in a data dimension reduction mode in the model testing stage, inputting the user access vector subjected to dimension reduction into the abnormal detection model to obtain a detection result, and realizing real-time monitoring and active defense on abnormal access of the database.

The first processing module, the second processing module and the third processing module respectively comprise an abnormality detection module and a human-computer interaction module;

the anomaly detection module is used for carrying out data access detection on the database information;

and the human-computer interaction module is used for displaying the abnormal access detection data.

In still another embodiment of the present invention, a distance-metric-based database abnormal access detection apparatus is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the distance-metric-based database abnormal access detection apparatus implements the distance-metric-based database abnormal access detection method described above.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种离线式入侵检测装置及方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类