Container abnormity detection method and system

文档序号:1952572 发布日期:2021-12-10 浏览:14次 中文

阅读说明:本技术 一种容器异常检测方法及系统 (Container abnormity detection method and system ) 是由 谢雨来 余岱松 周潘 冯丹 于 2021-08-12 设计创作,主要内容包括:本发明公开了一种容器异常检测方法及系统,属于云计算安全技术领域,方法包括:首先收集日志数据并进行过滤、简化、补全处理,再创新地提出按列编号得到新的日志数据;接着,分别计算新的日志数据中各列元素的支持度,去除每列中支持度小于最小支持度的元素,将每列中剩余的元素单独储存并在结尾加上零元素;按列依次随机选择一个元素并进行组合,并对组合的支持度进行判断,以找到所有支持度大于等于最小支持度且长度等于日志条目字段数的元素组合;最后,将选取权重最大的部分元素组合反馈给系统管理员,以进行容器异常检测。如此,本发明能够帮助系统管理员快速准确地定位异常,减轻系统维护人员的负担。(The invention discloses a container abnormity detection method and system, belonging to the technical field of cloud computing safety, wherein the method comprises the following steps: firstly, collecting log data, filtering, simplifying and completing the log data, and innovatively proposing the new log data by column numbering; then, respectively calculating the support degree of each row of elements in the new log data, removing the elements with the support degree smaller than the minimum support degree in each row, separately storing the rest elements in each row and adding zero elements at the end; randomly selecting one element according to the columns in sequence, combining the elements, and judging the support degree of the combination to find out all element combinations with the support degree being more than or equal to the minimum support degree and the length being equal to the number of the fields of the log entries; and finally, feeding back the selected partial element combination with the maximum weight to a system administrator for container abnormity detection. Therefore, the method and the system can help a system administrator to quickly and accurately locate the abnormality and reduce the burden of system maintenance personnel.)

1. A method for detecting container abnormality, comprising the steps of:

s1, collecting log data from the log files generated by each container;

s2, removing the log data containing error information, extracting the fields with analysis value from the residual log data by using a regular expression, and completing the missing log entries of partial fields;

s3, respectively assigning a number to different fields of each column in the completed log data to form new log data;

s4, respectively calculating the support degree of each row of elements in the new log data, removing the elements with the support degree smaller than the minimum support degree in each row, separately storing the remaining elements in each row and adding zero elements at the end;

s5, randomly selecting one element in sequence according to columns and combining, if the support degree of the combination is greater than or equal to the minimum support degree, splicing the combination with the next column of elements until all element combinations with the support degree greater than or equal to the minimum support degree and the length equal to the number of the log entry fields are found;

s6, calculating the weight of each element combination, selecting M element combinations with the maximum weight and feeding back the M element combinations to a system administrator for container abnormity detection.

2. The container abnormality detection method according to claim 1, wherein said S5 includes:

s51, selecting an element from the first row, judging whether the support degree of the element is larger than or equal to the minimum support degree, if so, splicing the element with the next row of elements, and continuing to judge the support degree until finding the element combination with the support degree larger than or equal to the minimum support degree and the length equal to the number of the log entry fields or abandoning the spliced combination; otherwise, go to S52;

s52, selecting the next element from the first column, and repeating S51 until all element combinations with the support degree being more than or equal to the minimum support degree and the length being equal to the number of log entry fields are found.

3. The container abnormality detection method according to claim 1 or 2, characterized in that in S6, the weight of the element combination is: the product of the number of non-zero elements in the element combination and the support degree of the element combination.

4. The container anomaly detection method according to claim 1 or 2, wherein in the step S6, selecting the M element combinations with the largest weights for feedback to a system administrator includes:

and comparing the M element combinations with the maximum weight with an abnormal rule base, and feeding back part of the element combinations with the rule items of the abnormal rule base to a system administrator.

5. The container abnormality detection method according to claim 1, wherein said S1 includes: and intercepting the newly generated part from the tail part of the log file generated by each container, and recording the position intercepted in the log file at this time in a memory as the starting point of collecting log data at the next time.

6. The container anomaly detection method according to claim 1, wherein in said S2, complementing the log entries with missing partial fields to form new log data comprises:

the missing fields are complemented using a default identification so that the complemented log data has the same number of fields.

7. A container anomaly detection system, comprising:

the collection module is used for collecting log data from the log files generated by the containers;

the processing module is used for removing the log data containing the error information, extracting fields with analysis value from the residual log data by using the regular expression, and completing log entries with missing partial fields;

the numbering module is used for respectively assigning a number to different fields of each column in the completed log data to form new log data;

the rule mining module is used for respectively calculating the support degree of each row of elements in the new log data, removing the elements with the support degree smaller than the minimum support degree in each row, separately storing the rest elements in each row and adding zero elements at the end; randomly selecting one element in sequence according to a column and combining the elements, and if the support degree of the combination is greater than or equal to the minimum support degree, splicing the combination with the next column of elements until all element combinations with the support degree greater than or equal to the minimum support degree and the length equal to the number of the log entry fields are found; and calculating the weight of each element combination, and selecting M element combinations with the maximum weight to feed back to a system administrator for container abnormity detection.

Technical Field

The invention belongs to the technical field of cloud computing security, and particularly relates to a container abnormity detection method and system.

Background

Cloud computing, as a new service model different from the conventional service model, can enable users to access high-quality, high-security infrastructure services at a lower price. Nowadays, more and more people favor using cloud computing technology to perform some large-scale data operations. The cloud technology is the foundation of virtualization, and the resource management and development of virtualization cannot be supported by cloud computing. However, because virtualization technology has problems such as long boot time, low system resource utilization rate, and stack environment difference between virtual machines, the Docker container technology is produced as an alternative.

Docker is a runtime environment designed specifically for system administrators and developers, which can share the kernel resources of the system host and build, publish, and run distributed applications on the system. The Docker container uses a cgroup control group to complete the limitation of hardware resources, and uses a kernel-level namespace to realize the isolation of various resources.

When an abnormality occurs in a container, the container daemon process can only store log information or records of relevant time periods in log files, and if massive original logs are directly delivered to a system administrator for processing, the burden of system maintenance personnel is undoubtedly greatly increased. Some abnormal conditions may directly generate corresponding error information to help maintenance personnel to locate the abnormal reason, but some malicious attack behaviors only generate normal logs as in normal operation, and the abnormal logs are found out in the normal logs and are difficult to complete only by manual analysis.

Disclosure of Invention

Aiming at the defects or the improvement requirements of the prior art, the invention provides a container abnormity detection method and a container abnormity detection system, aiming at digging out a frequent mode reflecting container abnormity in log data, helping a system administrator to quickly and accurately locate abnormity and reducing the burden of system maintenance personnel.

In order to achieve the above object, the present invention provides a container abnormality detection method, including the steps of:

s1, collecting log data from the log files generated by each container; s2, removing the log data containing error information, extracting the fields with analysis value from the residual log data by using a regular expression, and completing the missing log entries of partial fields; s3, respectively assigning a number to different fields of each column in the completed log data to form new log data; s4, respectively calculating the support degree of each row of elements in the new log data, removing the elements with the support degree smaller than the minimum support degree in each row, separately storing the remaining elements in each row and adding zero elements at the end; s5, randomly selecting one element in sequence according to columns and combining, if the support degree of the combination is greater than or equal to the minimum support degree, splicing the combination with the next column of elements until all element combinations with the support degree greater than or equal to the minimum support degree and the length equal to the number of the log entry fields are found; s6, calculating the weight of each element combination, selecting M element combinations with the maximum weight and feeding back the M element combinations to a system administrator for container abnormity detection.

Further, the S5 includes:

s51, selecting an element from the first row, judging whether the support degree of the element is larger than or equal to the minimum support degree, if so, splicing the element with the next row of elements, and continuing to judge the support degree until finding the element combination with the support degree larger than or equal to the minimum support degree and the length equal to the number of the log entry fields or abandoning the spliced combination; otherwise, go to S52; s52, selecting the next element from the first column, and repeating S51 until all element combinations with the support degree being more than or equal to the minimum support degree and the length being equal to the number of log entry fields are found.

Further, in S6, the weight of the element combination is: the product of the number of non-zero elements in the element combination and the support degree of the element combination.

Further, in S6, selecting M element combinations with the largest weights and feeding back the M element combinations to the system administrator includes: and comparing the M element combinations with the maximum weight with an abnormal rule base, and feeding back part of the element combinations with the rule items of the abnormal rule base to a system administrator.

Further, the S1 includes: and intercepting the newly generated part from the tail part of the log file generated by each container, and recording the position intercepted in the log file at this time in a memory as the starting point of collecting log data at the next time.

Further, in S2, completing the log entry with the missing partial field to form new log data, including: the missing fields are complemented using a default identification so that the complemented log data has the same number of fields.

In order to achieve the above object, the present invention also provides a container abnormality detection system, including:

the collection module is used for collecting log data from the log files generated by the containers; the processing module is used for removing the log data containing the error information, extracting fields with analysis value from the residual log data by using the regular expression, and completing log entries with missing partial fields; the numbering module is used for respectively assigning a number to different fields of each column in the completed log data to form new log data; the rule mining module is used for respectively calculating the support degree of each row of elements in the new log data, removing the elements with the support degree smaller than the minimum support degree in each row, separately storing the rest elements in each row and adding zero elements at the end; randomly selecting one element in sequence according to a column and combining the elements, and if the support degree of the combination is greater than or equal to the minimum support degree, splicing the combination with the next column of elements until all element combinations with the support degree greater than or equal to the minimum support degree and the length equal to the number of the log entry fields are found; and calculating the weight of each element combination, and selecting M element combinations with the maximum weight to feed back to a system administrator for container abnormity detection.

Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:

(1) the invention provides a container abnormity detection method, which comprises the steps of firstly collecting log data, filtering, simplifying and completing the log data, and innovatively proposing to obtain new log data according to column numbers; then, respectively calculating the support degree of each row of elements in the new log data, removing the elements with the support degree smaller than the minimum support degree in each row, separately storing the rest elements in each row and adding zero elements at the end; randomly selecting one element according to the columns in sequence, combining the elements, and judging the support degree of the combination to find out all element combinations with the support degree being more than or equal to the minimum support degree and the length being equal to the number of the fields of the log entries; and finally, feeding back the selected partial element combination with the maximum weight to a system administrator for container abnormity detection. Therefore, the method and the system can help a system administrator to quickly and accurately locate the abnormality and reduce the burden of system maintenance personnel.

(2) The invention does not use the self-connection mode to combine elements, but adopts the mode of selecting elements in each column and combining the elements according to the columns, thereby obviously reducing the times of element combination and having higher efficiency compared with the traditional rule mining algorithm.

Drawings

Fig. 1 is a flowchart of a container anomaly detection method according to an embodiment of the present invention;

FIG. 2 is a combined tree diagram obtained using a conventional Apriori algorithm;

FIG. 3 is a combined tree diagram obtained by using the rule mining algorithm proposed by the present invention;

FIG. 4 is a second flowchart of a container anomaly detection method according to an embodiment of the present invention;

fig. 5 is a block diagram of a container anomaly detection system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

In the present application, the terms "first," "second," and the like (if any) in the description and the drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Fig. 1 is a flowchart of a container anomaly detection method according to an embodiment of the present invention, where the detection method includes operations S1-S6.

In operation S1, log data is collected from the log files generated by the respective containers. Specifically, the method comprises the following steps:

taking a Docker container as an example, log data generated by each container is stored in a log folder in a container space, and a specific number of log entries are regularly collected from a log file according to the setting of a system administrator, wherein the specific method comprises the following steps: entering a container space of a target container, searching a file containing a recently generated log in a log folder, finding a subscript of the file when the file is intercepted last time in a memory, intercepting newly generated log data by taking the subscript as a starting point and the end of the file as an end point, and finally updating the intercepted subscript of the file as the current end of the file. When the system administrator no longer needs the application corresponding to a certain Docker container or the running time set by a certain Docker container is exhausted, the running of the Docker container is terminated and deleted.

Taking an example that an Nginx server is attacked by Dos, if the Nginx server in the system is attacked, a file storing a recently generated log is periodically searched from a container corresponding to the Nginx server, and generally, the path of the file is/var/lib/docker/containers/{ container id }/{ container id } -json. After the log file of the container is found, the most recently generated log is intercepted from the file as log data. In order to determine the location of the most recently generated log in the log file, a pointer is defined in memory for storing the location of the end of the log file when it was last intercepted. Therefore, by using the position of the pointer as a start point and the end of the file as an end point, it is possible to intercept the log data that has been generated most recently from the log file.

Operation S2 is to remove the log data containing the error information, extract the field with analysis value from the remaining log data by using the regular expression, and complete the log entries with missing partial field. Specifically, the method comprises the following steps:

the log data collected from the Docker container cannot be directly used as input to the rule mining algorithm. Firstly, log data contains a considerable proportion of error information, and the error information has no analytical value for the rule mining algorithm and can reduce the efficiency of the rule mining algorithm; secondly, one log is composed of a plurality of fields, and the effect of part of the fields on anomaly detection is limited, so that the load of algorithm execution is increased; finally, some logs have problems of field missing and duplication, and need to be completed. Therefore, log data collected from the Docker container needs further processing, including the following three parts:

(1) and (3) error information filtering: removing the log data containing the error information from the log data. The reasons for generating the error information are various, and the common reasons are: web page access failure due to denial of response by the target server, request timeout due to network congestion, file download interruption due to data transfer session failure, etc. And judging whether error information exists in the log or not by calculating the number of the log fields and analyzing important fields of the log such as state codes, and if so, deleting the log from the log data.

(2) Regular expression filtering: valuable fields are extracted from the log data. The regular expression is set by a system administrator, and fields with analytical value under different scenes can be extracted from log data, and generally, the fields with the analytical value comprise: ip address, time, request mode, file path, protocol type, status code, file size, download tool, process id, user name, server and client identifier, etc. The extracted fields will constitute new log data, essentially a subset of the original log data.

Taking log data of a nginnx server as an example, a nginnx server log contains 11 fields in total, wherein only eight fields, namely an ip address, time, a request mode, a file path, a protocol type, a status code, a file size and a downloading tool, have analytical values, and therefore a regular expression capable of extracting the eight fields from one log needs to be designed to complete field extraction.

(3) Completing the log: and completing the entries with field missing in the log data. Because the original log data is indefinite, after the regular expression filtering processing, the length of part of log entries is smaller than that of other log entries due to the missing of the field, and the log completion mechanism completes the missing field by using identifiers such as None, so that all log entries are ensured to have uniform field number.

In operation S3, a number is assigned to each of the different fields in each column of the completed log data to form new log data. Specifically, the method comprises the following steps:

taking the log data of the nginnx server as an example, table 1 is the log record of the nginnx server.

TABLE 1 Log record for Nginx servers

No. Log Entries
1 115.156.141.244 20:14:43GET/text/hello.txt HTTP/1.0 206 130Axel/2.15
2 115.156.141.244 20:14:43GET/compress/rwimg.zip HTTP/1.0 206 78790Axel/2.15
3 115.156.141.244 20:14:43GET/text/hello.txt HTTP/1.0 206 130Axel/2.15
4 115.156.140.125 20:14:43GET/compress/rwimg.zip HTTP/1.1 200 46734Wget/1.20.3
5 115.156.141.244 20:14:43GET/img/funkyfaced.jpg HTTP/1.0 206 66096Axel/2.15

The invention adopts the numbering according to the columns, each column of the log data is numbered independently, each different field in each column is endowed with a new number, and the table 2 is the result of numbering the log data of the Nginx server according to the columns.

Table 2 results of column numbering of the Nginx server log data

Column ID Field ID Field ID Field ID Field
ip address 1 115.156.141.244 2 115.156.140.125
time 1 20:14:43
request type 1 GET
file path 1 /text/hello.txt 2 /compress/rwimg.zip 3 /img/funkyfaced.img
protocal type 1 HTTP/1.0 2 HTTP/1.1
status code 1 206 2 200
file size 1 130 2 78790 3 46743 4 66096
download tool 1 Axel/2.15 2 Wget/1.20.3

And taking the numbered log data as input, and excavating a frequent pattern reflecting container abnormity by using the rule mining algorithm provided by the invention. Specifically, operations S4-S6:

operation S4, calculating the support degrees of the elements in each column in the new log data, removing the elements with the support degree smaller than the minimum support degree in each column, storing the remaining elements in each column separately, and adding a zero element to the end. Specifically, the method comprises the following steps:

respectively calculating the support degree of each row of elements, namely the occurrence times of different elements in each row, then removing the elements with the support degree not reaching the minimum support degree requirement in each row, storing the rest elements in each row into a linked list, and adding a zero element at the end of the linked list storing each row of elements. The zero element acts as a wildcard in the rule mining algorithm proposed by the present invention, and it can match any number in the column.

And operation S5, randomly selecting and combining one element in sequence according to columns, and if the support degree of the combination is greater than or equal to the minimum support degree, splicing the combination with the next column of elements until all element combinations with the support degree greater than or equal to the minimum support degree and the length equal to the number of log entry fields are found. Specifically, the method comprises the following steps:

s51, selecting an element from the first row, judging whether the support degree of the element is larger than or equal to the minimum support degree, if so, splicing the element with the next row of elements, and continuing to judge the support degree until finding the element combination with the support degree larger than or equal to the minimum support degree and the length equal to the number of the log entry fields or abandoning the spliced combination; otherwise, go to S52;

s52, selecting the next element from the first column, and repeating S51 until all element combinations with the support degree being more than or equal to the minimum support degree and the length being equal to the number of log entry fields are found.

Among them, the operation of splicing elements is called combining, and the operation of discarding combinations is called pruning. The method comprises the following concrete steps:

(S1) inputting a linked list storing each row of elements, the layer number, the current element sequence and the support degree of the current element sequence;

(S2) if the number of layers is not equal to the number of columns of log data, continuing (S3), otherwise jumping to (S10);

(S3) if the current layer number is K, taking out the first element in the linked list storing the K-th row of elements and inserting the first element into the tail of the current element sequence, deleting the element in the linked list, setting the support degree of the current element sequence to be 0, setting the subscript i and the subscript j, and setting the initial values of the subscript i and the subscript j to be 0;

(S4) setting a boolean value match to True, which represents whether the current sequence matches the log data;

(S5) if the j element of the ith row and the jth column in the log data is not equal to the j element in the current element sequence or the j element of the current element sequence is 0, setting the Boolean value match to False and executing (S8);

(S6) repeating the step (S5), each time adding one to the value of the index j until the value of the index j equals the current number of layers;

(S7) repeating the steps (S4) to (S5), each time adding one to the value of the index i until the value of the index i equals the number of rows of log data;

(S8) if the Boolean value match is True, adding one to the current element sequence support;

(S9) if the support degree of the current element sequence is less than the product of the percentage of the minimum support degree and the number of the log data lines, ending the execution and returning, otherwise, executing (S10);

(S10) executing (S1), and inputting a linked list in which the elements of each column are stored, the number of layers, the current element sequence, and the support of the current element sequence;

(S11) calculating the product of the log data column number and the current element sequence support degree, and recording the result as the weight of the current element sequence, outputting the weights of the current element sequence and the current element sequence, ending the execution, and returning.

Operation S6, calculating the weights of the element combinations, and selecting M element combinations with the largest weights to feed back to the system administrator for container anomaly detection. Specifically, the method comprises the following steps:

and calculating the weight of each element combination obtained by mining according to the definition of the weight, sorting according to the weight, and selecting a part of element combinations with the maximum weight, namely a frequent mode, to submit to a system administrator. The weight is defined by default as the product of the number of non-zero elements in the frequent pattern and the support degree of the frequent pattern. The frequent pattern is compared with the abnormal rule base before being submitted to a system administrator, if the rule item of the abnormal rule base exists in the frequent pattern, the frequent pattern represents that the abnormality in the container is found, and the frequent pattern reflecting the abnormality is submitted to the system administrator.

Different from the traditional rule mining algorithm which adopts a self-connection mode to combine elements, the rule mining algorithm provided by the invention effectively reduces the time complexity of the combination process and improves the rule mining efficiency by combining according to columns. By comparing fig. 2 and fig. 3, it can be found that the rule mining algorithm provided by the present invention better conforms to the structure of log data in a column-by-column combination manner, and the time complexity of the combination process is reduced to a certain extent.

Fig. 5 is a block diagram of a container anomaly detection system according to an embodiment of the present invention. Referring to fig. 5, the container anomaly detection system 500 includes a collection module 510, a processing module 520, a numbering module 530, and a rule mining module 540.

The collection module 510 performs, for example, operation S1 for collecting log data from the log files generated by the respective containers;

the processing module 520, for example, performs operation S2, to remove the log data containing the error information, extract a field with analysis value from the remaining log data by using a regular expression, and complete the missing log entry of a part of the field;

the numbering module 530, for example, performs operation S3, to assign a number to each different field in each column of the completed log data, respectively, to form new log data;

the rule mining module 540 performs, for example, operations S4-S6, for calculating the support degrees of the elements in each column in the new log data, respectively, removing the elements with the support degree smaller than the minimum support degree in each column, storing the remaining elements in each column separately and adding a zero element at the end; randomly selecting one element in sequence according to a column and combining the elements, and if the support degree of the combination is greater than or equal to the minimum support degree, splicing the combination with the next column of elements until all element combinations with the support degree greater than or equal to the minimum support degree and the length equal to the number of the log entry fields are found; and calculating the weight of each element combination, and selecting M element combinations with the maximum weight to feed back to a system administrator for container abnormity detection.

The container anomaly detection system 500 is used to perform the container anomaly detection method described above in the embodiment illustrated in FIG. 1. For details that are not described in the present embodiment, please refer to the container anomaly detection method in the embodiment shown in fig. 1, which is not described herein again.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种针对安全NVM的硬件日志控制器和硬件日志系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!