Machine learning based anomaly detection for embedded software applications

文档序号：108386 发布日期：2021-10-15 浏览：25次中文

阅读说明：本技术 用于嵌入式软件应用的基于机器学习的异常检测 (Machine learning based anomaly detection for embedded software applications ) 是由 Y·韦勒 G·摩西于 2019-03-05 设计创作，主要内容包括：系统、方法、逻辑和设备,可以支持用于嵌入式软件应用的基于机器学习的异常检测。在学习阶段,异常模型训练引擎可以构建异常检测模型,并且异常检测模型被配置成基于活动度量输入和应用参数输入来确定嵌入式软件应用是否呈现异常行为。在运行时阶段,异常检测引擎可以对嵌入式软件应用进行采样,以获得在运行时执行期间的活动度量和应用参数,并且提供在运行时执行期间采样的活动度量和采样的应用参数,作为到异常检测模型的输入。异常检测引擎还可以基于针对所提供的输入的来自异常检测模型的输出来确定嵌入式软件应用是否呈现异常行为。(Systems, methods, logic, and devices may support machine learning-based anomaly detection for embedded software applications. In the learning phase, the anomaly model training engine may build an anomaly detection model, and the anomaly detection model is configured to determine whether the embedded software application exhibits anomalous behavior based on the activity metric input and the application parameter input. During the runtime phase, the anomaly detection engine may sample the embedded software application to obtain activity metrics and application parameters during runtime execution and provide the activity metrics and the application parameters sampled during runtime execution as inputs to the anomaly detection model. The anomaly detection engine may also determine whether the embedded software application exhibits anomalous behavior based on output from the anomaly detection model for the provided input.)

1. A system, comprising:

an anomaly model training engine configured to:

sampling the embedded software application at a given sampling point to obtain:

an activity metric of the embedded software application since a previous sampling point; and

application parameters for the embedded software application at the given sampling point;

generating training data based on the activity metrics and the application parameters obtained for the given sampling point;

building an anomaly detection model using the training data, the anomaly detection model configured to determine whether the embedded software application exhibits anomalous behavior based on activity metric inputs and application parameter inputs; and

an anomaly detection engine configured to:

sampling the embedded software application at the given sampling point during runtime execution of the embedded software application to obtain activity metrics and application parameters during the runtime execution;

providing the activity metrics sampled during the runtime execution and the application parameters sampled as inputs to the anomaly detection model; and

determining whether the embedded software application exhibits anomalous behavior based on output from the anomaly detection model for the provided input.

2. The system of claim 1, wherein,

the anomaly model training engine is configured to sample the embedded software application during a learning phase, wherein execution of the embedded software application is carried out by a simulator during the learning phase; and

the anomaly detection engine is configured to sample the embedded software application during the runtime execution, wherein execution of the embedded software application is carried out by a hardware component in which the embedded software application is embedded during the runtime execution.

3. The system of claim 1, wherein the anomaly model training engine is further configured to:

identifying a given application task executed by the embedded software application at the given sampling point; and

constructing the anomaly detection model to include a plurality of task-specific anomaly detection models including a task-specific anomaly detection model for the given application task.

4. The system of claim 3, wherein the anomaly detection engine is further configured to:

sampling the embedded software application at a plurality of sampling points during the runtime execution; and

selecting, in the plurality of task-specific anomaly detection models, for the activity metrics sampled at the plurality of sampling points and the sampled application parameters based on a given application task executed by the embedded software application at the plurality of sampling points.

5. The system of claim 1, wherein the activity metric comprises a count of instructions executed since the previous sample point, an execution time since the previous sample point, or a combination of both.

6. The system of claim 1, wherein the anomaly model training engine is configured to obtain the application parameters for the embedded software application at the given sampling point from global variables or static variables stored by the embedded software application.

7. The system of claim 1, wherein the anomaly model training engine is configured to further generate the training data by performing a parameter selection process to determine that a selected subset of the obtained application parameters are included in the training data.

8. The system of claim 7, wherein the anomaly model training engine is configured to perform the parameter selection process via statistical correlation, consistency checking, or a combination of both.

9. The system of claim 1, wherein the anomaly detection engine is further configured to:

determining, during the runtime execution, that the embedded software application enters an inactive execution period; and

providing the input to the anomaly detection model, and determining whether the embedded software application exhibits anomalous behavior utilizing the anomaly detection model while the embedded software application is in the inactive execution period.

10. A method, comprising:

through an embedded system:

sampling an embedded software application at a given sampling point during runtime execution of the embedded software application to obtain activity metrics and application parameters during the runtime execution;

determining, during the runtime execution, that the embedded software application enters an inactive execution period, and in response:

accessing an anomaly detection model configured to determine whether the embedded software application exhibits anomalous behavior based on activity metric inputs and application parameter inputs;

providing the activity metrics and the application parameters sampled for the given sampling point during the runtime execution as inputs to the anomaly detection model; and

determining whether the embedded software application exhibits anomalous behavior based on output from the anomaly detection model for the provided input.

11. The method of claim 10, wherein the activity metric comprises a count of instructions executed since a previous sample point, an execution time since the previous sample point, or a combination of both.

12. The method of claim 10, wherein the anomaly detection model comprises a plurality of task-specific anomaly detection models including different task-specific anomaly detection models for different application tasks, and further comprising:

sampling the embedded software application at a plurality of sampling points during the runtime execution; and

13. The method of claim 10, wherein the sampling comprises obtaining the application parameters for the embedded software application at the given sampling point from a global variable or a static variable stored by the embedded software application.

14. The method of claim 10, further comprising, during a learning phase, training the anomaly detection model by:

sampling the embedded software application at the given sampling point to obtain:

an activity metric of the embedded software application since a previous sampling point; and

application parameters for the embedded software application at the given sampling point;

generating training data based on the activity metrics and the application parameters obtained for the given sampling point; and

the training data is utilized to construct the anomaly detection model.

15. A non-transitory machine-readable medium comprising instructions that when executed by a processor cause an embedded system to:

determining, during the runtime execution, that the embedded software application enters an inactive execution period, and in response:

accessing an anomaly detection model configured to determine whether the embedded software application exhibits anomalous behavior based on activity metric inputs and application parameter inputs;

providing the activity metrics and the application parameters sampled during the runtime execution as inputs to the anomaly detection model; and

determining whether the embedded software application exhibits anomalous behavior based on output from the anomaly detection model for the provided input.

Technical Field

The present application relates generally to embedded software applications, and more particularly to machine learning-based anomaly detection for embedded software applications.

Background

Software applications are becoming more and more prevalent as technology continues to improve. The embedded software application may control a machine or device in a physical system, such as an automobile, a security system, a home appliance, a toy, a digital watch, a biological device, and so forth. Embedded systems that include embedded software may be targeted for security attacks by malware, viruses, spyware, and the like.

Disclosure of Invention

According to an aspect of the present invention, there is provided a system comprising an anomaly model training engine and an anomaly detection engine, the anomaly model training engine configured to: sampling the embedded software application at a given sampling point to obtain: an activity metric of the embedded software application since a previous sampling point; and application parameters for the embedded software application at the given sampling point; generating training data based on the activity metrics and the application parameters obtained for the given sampling point; building an anomaly detection model using the training data, the anomaly detection model configured to determine whether the embedded software application exhibits anomalous behavior based on activity metric inputs and application parameter inputs; the anomaly detection engine is configured to: sampling the embedded software application at the given sampling point during runtime execution of the embedded software application to obtain activity metrics and application parameters during the runtime execution; providing the activity metrics sampled during the runtime execution and the application parameters sampled as inputs to the anomaly detection model; and determining whether the embedded software application exhibits anomalous behavior based on output from the anomaly detection model for the provided input.

According to another aspect of the present invention, there is provided a method comprising, by an embedded system: sampling an embedded software application at a given sampling point during runtime execution of the embedded software application to obtain activity metrics and application parameters during the runtime execution; determining, during the runtime execution, that the embedded software application enters an inactive execution period, and in response: accessing an anomaly detection model configured to determine whether the embedded software application exhibits anomalous behavior based on activity metric inputs and application parameter inputs; providing the activity metrics and the application parameters sampled for the given sampling point during the runtime execution as inputs to the anomaly detection model; and determining whether the embedded software application exhibits anomalous behavior based on output from the anomaly detection model for the provided input.

According to yet another aspect of the invention, there is provided a non-transitory machine-readable medium comprising instructions that when executed by a processor cause an embedded system to: sampling an embedded software application at a given sampling point during runtime execution of the embedded software application to obtain activity metrics and application parameters during the runtime execution; determining, during the runtime execution, that the embedded software application enters an inactive execution period, and in response: accessing an anomaly detection model configured to determine whether the embedded software application exhibits anomalous behavior based on activity metric inputs and application parameter inputs; providing the activity metrics and the application parameters sampled during the runtime execution as inputs to the anomaly detection model; and determining whether the embedded software application exhibits anomalous behavior based on output from the anomaly detection model for the provided input.

Drawings

Certain examples are described in the following detailed description and with reference to the accompanying drawings.

FIG. 1 illustrates an example of a system that supports machine learning-based anomaly detection for embedded software applications.

Fig. 2 shows an example of anomaly detection model training, which is performed by an anomaly model training engine via machine learning.

FIG. 3 illustrates exemplary training of a task-specific anomaly detection model by an anomaly model training engine.

FIG. 4 illustrates exemplary run-time characteristics of embedded application behavior by the anomaly detection engine.

FIG. 5 illustrates an example of logic that a system may implement to support learning phase training of an anomaly detection model.

FIG. 6 illustrates an example of logic that a system may implement to support exception detection during runtime execution of embedded software.

FIG. 7 illustrates an example of a system that supports machine learning-based anomaly detection for embedded software applications.

Detailed Description

The following discussion relates to embedded software applications (applications), which may also be referred to as embedded software or embedded applications. As used herein, an embedded software application may refer to software that executes on a physical system other than a desktop or laptop computer. Such physical systems may also be referred to as embedded systems and are typically limited in computing and memory capabilities. In many cases, the embedded software application may interact with a machine or other physical element of the embedded system, and the embedded application may thus be used to monitor or control machines or devices in vehicles, telephones, modems, robots, electrical devices, security systems, and the like.

The present disclosure may provide systems, methods, devices, and logic that support anomaly detection for embedded software applications via machine learning. As described in more detail below, the machine learning based anomaly detection features disclosed herein may take into account specific application parameters that affect the activity (e.g., execution time) of an embedded software application. The anomaly detection model may be trained with specific consideration of application parameters, and the machine learning model may associate the application parameters with execution activities (e.g., as measured by instruction counts or execution cycles) to characterize normal and abnormal application behaviors. By specifically considering application parameters in model training, the machine learning-based anomaly detection features presented herein can provide a resource-efficient mechanism to track application behavior and identify anomalies, by considering the way application context affects execution activities.

These and other benefits of the disclosed machine learning-based anomaly detection features are described in more detail herein.

FIG. 1 shows an example of a system 100 that supports machine learning-based anomaly detection for embedded software applications. System 100 may take various forms and may include a single or multiple computing devices, such as an application server, a computing node, a desktop or laptop computer, a smart phone or other mobile device, a tablet device, an embedded controller, or any hardware component or physical system that includes embedded software. The system 100 may take any form of system having computing capabilities by which an anomaly detection model for an embedded software application may be trained, used, or otherwise applied.

As described in greater detail herein, the system 100 may support machine learning-based anomaly detection in a learning phase, a runtime phase, or both. In the learning phase, the system 100 can use machine learning to characterize the activities of the embedded software application according to different application parameters that affect the execution activities. Via machine learning and a training set containing sampled application parameters and measured application activities, the system 100 can build an anomaly detection model to detect anomalous behavior of the embedded software application. In the runtime phase, the system 100 may access the trained anomaly detection model to detect anomalies based on measured runtime activity of the embedded software application for the sampled runtime application parameters. Thus, the system 100 can support anomaly detection in embedded software applications by anomaly detection models built via machine learning.

The system 100 may be implemented in various ways to provide any of the machine learning based anomaly detection features described herein. As an exemplary embodiment, the system 100 shown in FIG. 1 includes an anomaly model training engine 110 and an anomaly detection engine 112. The system 100 may implement the engine 110 and the engine 112 (and components thereof) in various ways, such as by hardware and programming. The programming for engines 110 and 112 may be in the form of processor-executable instructions stored on a non-transitory machine-readable storage medium, and the hardware for engines 110 and 112 may include a processor for executing these instructions. The processor may take the form of a single-processor or multi-processor system, and in some examples, system 100 implements multiple engines using the same computing system features or hardware components (e.g., a common processor or a common storage medium).

In operation, the anomaly model training engine 110 can train anomaly detection models using machine learning based on application behavior of the embedded software application. For example, the anomaly model training engine 110 may be configured to sample the embedded software application at a given sampling point to obtain (i) activity metrics of the embedded software application since a previous sampling point, and (ii) application parameters for the embedded software application at the given sampling point. The anomaly model training engine 110 may also be configured to generate training data based on the activity metrics and application parameters obtained for a given sampling point, and to build an anomaly detection model using the training data. The anomaly detection model may be configured to determine whether the embedded software application exhibits anomalous behavior based on the activity metric inputs and the application parameter inputs.

In operation, the anomaly detection engine 112 can access the anomaly detection model to provide real-time anomaly detection capabilities during runtime execution of the embedded software application. Such execution at runtime may refer to or include execution of an embedded software application in a physical system, where the embedded software application is designed to operate in the physical system (e.g., medical device, aircraft controller, anti-lock braking system, etc.). In some implementations, the anomaly detection engine 112 can be configured to sample the embedded software application at sampling points during runtime execution of the embedded software application to obtain activity metrics and application parameters during runtime execution. The anomaly detection engine 112 can also be configured to provide the sampled activity metrics and sampled application parameters during runtime execution as inputs to an anomaly detection model and determine whether the embedded software application exhibits anomalous behavior based on output from the anomaly detection model for the provided inputs.

These and other machine learning based anomaly detection features according to the present disclosure are described in more detail below. In particular, exemplary features related to training an anomaly detection model in a learning phase are described in conjunction with fig. 2 and 3. Exemplary characteristics relating to using an anomaly detection model to detect anomalous application behavior during a runtime phase are described in connection with FIG. 4.

Fig. 2 shows an example of anomaly detection model training, which is performed by anomaly model training engine 110 via machine learning. During the learning phase, the anomaly model training engine 110 may track the behavior of the embedded software application and process the tracked application behavior into training data to train the anomaly detection model.

As an illustrative example, FIG. 2 depicts an embedded system 210. The embedded system 210 may be any system that implements or includes embedded software, including, for example, the embedded software application 212 shown in FIG. 2. In different embedded system embodiments, execution of the embedded software application 212 may be performed by different computing resources. In some examples, the embedded software application 212 may be implemented multiple times by firmware (e.g., as a component of a microcontroller, a system-on-a-chip (SoC), or other hardware) with limited memory or processor capabilities. In other instances, the embedded software application 212 may be executed with an emulation or simulation system, which may be advantageous during the learning phase because execution of the embedded software application 212 during the learning phase need not be limited to limited memory or processing power as compared to an actual runtime implementation. In the example shown in FIG. 2, the embedded system 210 includes a simulator 214, which simulator 214 may carry out execution of the embedded software application 212 during a learning phase for training the anomaly detection model.

As described in more detail herein, the anomaly model training engine 110 can build anomaly detection models via machine learning. The anomaly detection model may characterize application behavior as normal or abnormal. To train the anomaly detection model, the anomaly model training engine 110 can collect application data during execution of the embedded software application 212 during a learning phase. The anomaly model training engine 110 may then train the anomaly detection model using training data comprised of application data sampled during execution of the embedded software application 212.

In accordance with the present disclosure, the anomaly model training engine 110 may sample selected types of application data to train the anomaly detection model. In particular, the anomaly model training engine 110 can obtain (i) activity metrics and (ii) application parameters at various sampling points during execution of the embedded software application 212.

The activity metric may refer to a measurable amount of activity, such as an instruction count, for the embedded software application 212. To determine the instruction count, the exception model training engine 110 may access the tracked instruction execution through system hardware (e.g., a performance monitor unit), system software (e.g., operating system functions, APIs, etc.), or a combination of both. The obtained instruction count for the embedded software application 212 may be used as an activity indicator because the instruction count may ignore memory access costs and cache (cache) hit/miss rates that would introduce random variations in activity metrics and reduce the accuracy of the trained anomaly detection model. Another exemplary activity metric that may be obtained by the anomaly model training engine 110 is application execution time. To measure execution time, the anomaly model training engine 110 may access a cycle counter of the CPU core or utilize a system driver to extract cycle data between different execution points of the embedded software application 212.

By determining activity metrics at different execution points, the anomaly model training engine 110 can obtain a quantitative measure of application activity of the embedded software application 212 during normal behavior (i.e., execution unaffected by malware intrusion). However, execution time, instruction count, and merely quantitative activity metrics may reveal incomplete pictures of embedded software execution. The execution activity may increase or decrease depending on the execution context of the embedded software application 212, and the same software operation, task or thread of execution may have (significantly) different execution times based on the application parameters applicable during execution. Exemplary application parameters that may affect the sampled activity metric include memory conditions, input data size, input data content (e.g., high resolution data versus low resolution data), application control parameters (e.g., high precision and low precision operating modes), system power constraints, and the like.

To account for these variations, the anomaly model training engine 110 may also sample the embedded software application 212 for application parameters applicable to the sampled activity metrics. Application parameters may refer to any system, application, or global parameters that affect the execution of the embedded software application 212. The anomaly model training engine 110 can sample application parameters of the embedded software application 212 in various ways. For example, the exception model training engine 110 may access static memory allocated to application tasks or threads to obtain storage parameters for a particular application task or thread of the embedded software application 212. Additionally or alternatively, the anomaly model detection may access global variables stored in global memory or obtain long-term state values applicable to the embedded system 210, the embedded software application 212, or a combination of both.

In some implementations, the anomaly model training engine 110 can implement a parameter access function by which the embedded software application 212 itself can provide applicable application parameters during sampling. The implemented parameter access function may take the form of an API to extract application parameters in a non-intrusive or non-destructive manner. To illustrate, the embedded system may store input data or operating parameters (e.g., as specified in an input communication frame received by the embedded software application 212) in a communication controller memory, a system register, or a first-in-first-out (FIFO) queue. Memory read accesses to such memory structures would be destructive operations and/or inaccessible without drive level priority. Thus, the parameter access function provided by the anomaly model training engine 110 may provide a non-destructive mechanism to sample relevant application parameters during execution of the embedded software application 212.

As another implementation feature, the anomaly model training engine 110 may preprocess the input data provided to the embedded software application 212 to extract applicable application parameters. For example, the anomaly model training engine 110 may pre-process the input in the form of an image or video file to determine file indicators, data patterns, or multimedia characteristics that may be obtained by the anomaly model training engine 110 as application parameters for sampling of the embedded software application 212 at selected sampling points.

In various manners described herein, the anomaly model training engine 110 may sample activity metrics and application parameters during execution of the embedded software application 212. The particular execution points at which the activity metrics and application parameters are sampled may be preselected by the anomaly model training engine 110. To help explain these features, an execution timeline 220 is shown in FIG. 2 to illustrate different sampling points at which the anomaly model training engine 110 may sample application data during simulated execution of the embedded software application 212. As shown in execution timeline 220, anomaly model training engine 110 may select sample points s₁、s₂、s₃And s₄At the sampling point s₁、s₂、s₃And s₄The embedded software application 212 is sampled to obtain activity metrics and application parameters.

At each selected sampling point s₁、s₂、s₃And s₄At this point, the anomaly model training engine 110 may obtain activity metrics (e.g., a prioriAn indication of application activity since the previous sample point) and application parameters applicable to that sample point. Thus, at the sampling point s₂At this point, the anomaly model training engine 110 may determine s from the previous sample points₁The measure of activity since (e.g., the count of instructions executed by the embedded software application 212) and the sampling point s₂The effective application parameter of (c). In FIG. 2, the anomaly model training engine 110 obtains the sampling points s depicted₁、s₂、s₃And s₄A set of activity metrics 231 and application parameters 232 sampled from the embedded software application 212.

In some embodiments, the anomaly model training engine 110 selects sample points for the embedded software application 212 to cover the active execution period of the embedded software application 212. The execution timeline 220 in FIG. 2 shows different times that the embedded software application 212 is in an active state (active), depicted as a diagonally patterned portion along the execution timeline 220 (which may also be referred to as an active execution period). The embedded software application 212 may be referred to as active when some or all of the computing resources of the embedded system are actively used to execute the embedded software application 212. The embedded software application 212 may be referred to as inactive (or idle) when the resources of the computing embedded system are either unused or idle.

In many embedded systems, embedded software is designed to receive input, process the input, and generate output. Embedded software applications are commonly used in physical systems to monitor system components, and monitored inputs may occur during operation of such physical systems (e.g., sensing particular signals, receiving data files to be processed, etc.). The period of effective execution of the embedded software may include the time after receiving the input and the period of processing the input until the corresponding output is generated. After generating the output, the embedded software may become inactive until a subsequent input is received.

An exemplary sequence of active execution periods and inactive execution periods of the embedded software application 212 is illustrated in the execution timeline 220 of FIG. 2. At the sampling point s₁Embedded software applications212 may become active (or in other words, enter an activity execution period) in response to receiving the input. When generating an output (not shown), the embedded software application 212 may remain active up to a sampling point s₂. From the sampling point s₂To the sampling point s₃The embedded software application 212 may be in an inactive execution period and sample the point s from₃To the sampling point s₄Resume to active execution to process at sampling point s₃The received input is processed.

The anomaly model training engine 110 can determine to sample the embedded software application 212 in response to the embedded software application 212 becoming active, inactive, or both. In other words, the anomaly model training engine 110 can select sampling points such that the embedded software application 212 is sampled in response to received inputs, generated outputs, or a combination of both. In other words, the anomaly model training engine 110 may sample the embedded software application 212 in a manner that obtains activity metrics for a given activity execution period and obtains applicable application parameters for each given activity execution period. (the anomaly model training engine 110 may also sample the embedded software application 212 on a task-specific basis, as described in more detail below with respect to FIG. 3). In this manner, the anomaly model training engine 110 may select sampling points at which to sample the embedded software application 212 for activity metrics and application parameters.

Based on the sampled activity metrics and the sampled application parameters, the anomaly model training engine 110 may construct training data to train the anomaly detection model. In FIG. 2, the anomaly model training engine 110 generates training data 240, and the anomaly model training engine 110 may be constructed to include some or all of the activity metrics 231 and application parameters 232 sampled from the embedded software application 212. In some instances, the anomaly model training engine 110 may filter the sampled application parameters 232 to determine a selected subset of relevant application parameters (e.g., application parameters that most impact application activity of the embedded software). In practice, the anomaly model training engine 110 may perform a parameter selection process to select relevant machine learning features to train the anomaly detection model. In performing the parameter selection process, the anomaly model training engine 110 can employ statistical correlation techniques, consistency checks, or a combination of both to determine a particular subset of application parameters that characterize the application activity.

The anomaly model training engine 110 can utilize the prepared training data 240 to build an anomaly detection model. In fig. 2, the anomaly model training engine 110 provides training data 240 as a training set to train an anomaly detection model 250. To train the anomaly detection model 250, the anomaly model training engine 110 may utilize any number of machine learning techniques. For example, the anomaly detection model 250 may implement any number of supervised, semi-supervised, unsupervised, or augmented learning models to characterize the behavior of embedded software applications based on sampled activity metrics and sampled application parameters. The anomaly detection model 250 may include a support vector machine, Markov chain (Markov chain), context tree, neural network, Bayesian network (Bayesian network), or various other machine learning components.

In particular, the anomaly model training engine 110 can build the anomaly detection model 250 to determine whether the embedded software application exhibits anomalous behavior based on the activity metric inputs and the application parameter inputs. In some embodiments, the anomaly detection model 250 may take the form of a Support Vector Machine (SVM) and provide anomaly determinations for activity metric inputs and application parameter inputs.

The output provided by the anomaly detection model 250 can be a binary value indicating whether the anomaly detection model 250 has identified anomalous behavior of the embedded software application 212. In other examples, the anomaly detection model 250 may provide a probability value that the provided activity metric input and application parameter input indicate anomalous application behavior. As yet another example, the exception model training engine 110 may provide a predicted activity metric for application parameter input by which exception application behavior may be detected based on a comparison to a runtime activity metric sampled from embedded software. Any such anomaly detection technique may be envisioned via anomaly detection model 250 and discussed further below with respect to FIG. 4.

As described above, the anomaly model training engine 110 can build the anomaly detection model 250 from application data sampled from the embedded software application 212. In the example of fig. 2, the anomaly model training engine 110 trains the anomaly detection model 250 using activity metrics and application parameters sampled during execution. In some implementations, the anomaly model training engine 110 can sample application data (specifically, activity metrics and application parameters) and train anomaly detection models at a finer granularity than general application behavior. For example, the anomaly model training engine 110 may sample and characterize task-specific behavior as normal or anomalous. These features will be discussed below with respect to fig. 3.

FIG. 3 illustrates exemplary training of a task-specific anomaly detection model by anomaly model training engine 110. An application task (also referred to as a task, thread of execution, or application thread) may refer to any execution sequence of embedded software (e.g., an initialization thread) for performing task-specific or other tasks resulting from an instance of the embedded software. An application task may also refer to a programmed sequence of instructions that may be managed by a scheduler or other operating system logic. Execution of embedded software may include multiple active task executions, which may involve context switching, preemption (preemption), and other interruptions between execution sequences of different application tasks. The anomaly model training engine 110 can train anomaly detection models on a task-specific basis, which can support characterization of normal or abnormal embedded application behavior on a task-specific basis.

For illustration, FIG. 3 depicts the embedded system 210 depicted in FIG. 2, the embedded system 210 including an embedded software application 212 and a simulator 214 for simulated application execution during a learning phase. As also shown in FIG. 3, an example of an execution timeline 320 depicts a number of different tasks that are executed as part of the embedded software application 212. In the execution timeline 320, the markers are tasks_AIs shown as being patterned with diagonals along the portion of the execution timeline 320 (also referred to as being for the task)_AValid execution period). At execution timeAlso shown in line 320 are the labels as tasks_BIs shown as being active in a portion patterned with vertical lines along the execution timeline 320.

The anomaly model training engine 110 can sample the embedded software application 212 at sufficient sampling points to determine activity metrics and application parameters for a given application task from the start of the task to the completion of the task, even when the execution of the given application task is preempted by other application tasks. To this end, the anomaly model training engine 110 may sample the embedded software application 212 at the execution point where a given application task starts, pauses (e.g., due to preemption or context switching), or completes. In the example shown in FIG. 3, the execution timeline 320 depicts a sample point s₁、s₂、s₃、s₄、s₅、s₆And s₇Where at these sampling points, the anomaly model training engine 110 samples the embedded software application 212 for tasks_AAnd task_BActivity metrics and application parameters.

In the example shown in FIG. 3, in response to receiving a task-specific response_ABegins executing the task (marked as input (a) in the execution timeline 320)_A. Also in this example, the tasks of the embedded software application_BHigher in priority and preempt tasks_AIs performed. In response to receipt of task-specific data by the embedded software application 212_BIs entered (marked as input (B) in execution timeline 320), task B begins executing. Thus, from the sampling point s₁(when starting to execute a task in response to input (A)_ATime) to a sampling point s₆(when task)_AWhen execution of (c) is complete), the task_BPreempting the task by two executing instances of_AIs performed. As shown in FIG. 3, the anomaly model training engine 110 may sample the embedded software application 212 at certain execution points: when task_AAt the beginning (sampling point s)₁And s₇) When task is carried out_BAt the beginning (sampling point s)₂And s₄) Task, task_AWhen preempted (also isSampling point s₂And s₄) When task is carried out_BWhen it is finished (sampling point s)₃And s₅) When task is carried out_ARecovery (also sampling point s)₃And s₅) When, and when tasks_ACompletion (sampling point s)₆) Then (c) is performed.

By sampling the embedded software application 212 at different task start, pause, or stop points, the anomaly model training engine 110 can determine that it is to be used for the entire task_AActivity measures of, i.e. tasks_AIs tasked at multiple execution points_BPre-emption of execution instances. In FIG. 3, the anomaly model training engine 110 samples activity metrics 331 and application parameters 332 from the embedded software application 212 during execution of the timeline 320. The sampled activity metric 331 may include activity metrics for the task_BOne for the slave sampling point s₂To s₃Task of execution_BOne for the slave sample point s₄To s₅Task of execution_BExamples of (3). The sampled application parameters 332 may include application parameters for the task_BAt least two sets of application parameters, one set of application parameters being applicable to the task_BEach instance of (a). In addition, the anomaly model training engine 110 can obtain a training model for the sampling point s₁Starting at s₆Completed task_AWherein the anomaly model training engine 110 can determine the activity metric as a function of s₁To s₂From s₃To s₄And from s₅To s₆A sum of the sampled activity metrics. In a similar manner, the anomaly model training engine 110 may determine, for the sum of activity metrics, applicable tasks_ASpecific application parameters.

Thus, the anomaly model training engine 110 can sample the embedded software application 212 at different execution points based on task specificity. In doing so, the anomaly model training engine 110 can identify sampling points (e.g., tasks) at which embedded software applications 212 are given_AUp to the sampling point s₂Previously active) for a given application task. During sampling "Identification of an active "application task may involve accessing OS system parameters that indicate a current thread, a current task, a current process, or other system indicator.

The anomaly model training engine 110 may also specifically construct a training set that distinguishes between application data sampled for different application tasks. In FIG. 3, the anomaly model training engine 110 prepares training data 340 based on the sampled activity metrics 331 and the sampled application parameters 332. Training data 340 may be generated to include a plurality of different training sets that are distinguished on a task-specific basis. In this regard, the training data 340 may include tasks for the embedded software application 212_AAnd task_BDifferent training sets of (2).

Anomaly model training engine 110 can build anomaly detection model 250 to include multiple task-specific anomaly detection models, such as the task-specific anomaly detection models shown in FIG. 3 as models 351 and 352_AAnd task_BTask specific anomaly detection model(s). In this regard, the anomaly model training engine 110 can provide a given set of task-specific training data to train a given task-specific anomaly detection pattern, and the training of multiple task-specific anomaly detection models can support characterization of application behavior based on task-specific anomalies. In some implementations, the anomaly model training engine 110 can build the anomaly detection model 250 as a plurality of task-specific anomaly detection models, e.g., a different model for each of some or all of the application tasks supported by the embedded software application 212.

To further illustrate, the task-specific anomaly detection models 351 and 352 may provide task-specific characterization of application behavior. For example, tasks_AThe anomaly detection model 351 can provide tasks specific to the embedded software application 212_AIs determined based on task-specific_AAnd application parameter inputs to do so. In a similar manner, the task_BThe anomaly detection model 352 can provide tasks specific to the embedded software application 212_BIs determined. Due to a given training by the anomaly model training engine 110The task-specific anomaly detection model of (a) may be specifically trained with application parameters applied to a given task, and thus the trained task-specific anomaly detection model may be specifically tailored for task-specific contexts that affect the execution activity of the embedded software application 212 on a task-specific basis.

In any of the above approaches, the anomaly model training engine 110 may support training of the anomaly detection model in a learning phase. In particular, the anomaly model training engine 110 can use machine learning to train anomaly detection models configured to characterize embedded application behavior, which takes into account specific application parameters applicable during embedded software execution. The trained anomaly detection model may be accessed and used during a runtime phase to detect anomalous behavior of the embedded software application, as described below with respect to FIG. 4.

FIG. 4 illustrates exemplary runtime characteristics of embedded application behavior by the anomaly detection engine 112. In FIG. 4, an example of an embedded system 410 is implemented as part of a physical system, such as a brake component of a tank truck. Although illustrated separately, the anomaly detection engine 112 and the anomaly detection model 250 may also be part of the embedded system 410, such as sharing common computing resources (e.g., memory or one or more processors).

The embedded system 410 may include an embedded software application 212 embedded in a hardware component 412 (e.g., an embedded controller). In FIG. 4, the hardware component 412 communicates with the tank truck anti-lock brake sensor 414, however other applications are contemplated that have virtually no limit in the physical system. In the example shown in fig. 4, hardware component 412 may execute embedded software application 212 to monitor braking conditions (inputs to embedded software application 212) sensed by anti-lock braking sensor 414 and generate an output based on the sensed conditions. Such actual operation in the tank truck may be characterized as runtime execution of the embedded software application 212 (e.g., within a physical system in which the embedded software application 212 is designed to operate).

As also shown in FIG. 4, the anomaly detection engine 112 can monitor the behavior of the embedded software application 212 during runtime execution. To this end, the anomaly detection engine 112 can access an anomaly detection model 250 trained for the embedded software application 212 during a learning phase (e.g., as described above). The anomaly detection engine 112 can sample the embedded software application 212 at selected sampling points to obtain activity metrics and application parameters, including task-specific based activity metrics and application parameters. The anomaly detection engine 112 can sample the embedded software application 212 in a manner consistent with the anomaly model training engine 110, including selecting features according to any sampling points described herein (e.g., in fig. 2 and 3). The sampled activity metrics and sampled application parameters may be provided as inputs to the anomaly detection model 250, from which the anomaly detection model 250 may produce an anomalous behavior determination.

For purposes of illustration by FIG. 4, an exemplary execution timeline 420 during runtime execution of the embedded software application 212 is shown. In executing the timeline 420, the tasks of the embedded software application 212_AShown as active in the portion patterned with diagonal lines along execution timeline 420. Tasks are also shown in the execution timeline 420 by along portions of the execution timeline 420 patterned with vertical lines_BThe activity execution period of (c). The anomaly detection engine 112 can sample the embedded software application at multiple sampling points during runtime execution, including at any execution point where an application task starts, pauses (e.g., is preempted), or completes. As shown in FIG. 4, anomaly detection engine 112 is performing sampling points s of timeline 420₁、s₂、s₃And s₄The embedded software application 212 is sampled.

The anomaly detection engine 112 can obtain activity metrics 431 and application parameters 432 for the embedded software application 212 that are specific to the execution point at which the embedded software application 212 was sampled. The sampled activity metrics 431 and sampled application parameters 432 may be task specific, including for example for a task_BFrom the sampling point s₂To s₃Instruction count or other activity metrics and tasks_BSpecific slave sampling point s₂To s₃The application parameters of (1). In a consistent manner, the sampled activity metrics 431 may include activity metrics for a task_AFrom the sampling point s₁To s₂And s₃To s₄Total activity metric of and on task_AIs active during the activity execution period.

In some implementations, the anomaly detection engine 112 samples the embedded software application 212 for application parameters consistent with the features used to train the anomaly detection model 250. In doing so, the anomaly detection engine 112 can sample a selected subset of application parameters used by the embedded software application 212 or corresponding application task. The selected subset of application parameters sampled by the anomaly detection engine 112 may be the same as the selected subset of application parameters determined by the anomaly model training engine 110 from the parameter selection process (which may be performed on a task-specific basis). In other words, anomaly detection engine 112 may sample a particular subset (e.g., task-specific) of the application parameters used to train anomaly detection models 250, 351, or 352, without having to sample other application parameters not used to train these models.

The anomaly detection engine 112 can provide the sampled activity metrics 431 and sampled application parameters 432 as inputs to the anomaly detection model 250 to characterize application behavior of the embedded software application 212. For task-specific characteristics, anomaly detection engine 112 may select among a plurality of task-specific anomaly detection models (e.g., 351 and 352) for use at sampling point s₁To s₂And s₃To s₄Sampled activity metrics 431 and sampled application parameters 432. The anomaly detection engine 112 can do so based on a given application task executed by the embedded software application 212 at multiple sampling points, e.g., by providing for tasks_AAs a measure of activity on the task and application parameters of the sample_AInput to the anomaly detection model 351 and will be used for the task_BAnd the sampled activity metric and the sampled application parameter are provided to the task_BThe anomaly detection model 352.

Anomaly detection model250 may provide an abnormal behavior determination 460 generated from the input activity metrics 431 and the application parameters 432. The abnormal behavior determination 460 may take the form of any type of output supported by the abnormality detection model 250 (including the task-specific abnormality detection models 351 and 352). Exemplary outputs include binary value outputs indicating normal or abnormal application behavior, abnormal probabilities, etc., any of which may be task specific. Thus, the abnormal behavior determination 460 may include task-specific outputs, each of which may characterize whether a task-specific behavior of the embedded software application 212 is abnormal, and do so based on specifically sampled application parameters. In the example of FIG. 4, the abnormal behavior determination 460 may provide information about the task_AAnd task_BIs applied as a separate indication of whether the behavior is characterized as abnormal.

In some implementations, the anomaly detection engine 112 can access the anomaly detection model 250 during periods of inactive execution of the embedded software application 212. The embedded system 410 may have limited computational/memory resources or be subject to precise timing constraints. To reduce timing interference or resource overhead for anomaly detection, the anomaly detection engine 112 can determine the embedded software application 212 during runtime execution (e.g., at sampling point s)₄At) enters an inactive execution period. In response to this determination, the anomaly detection engine 112 can provide sampled inputs (e.g., sampled activity metrics 431 and sampled application parameters 432) to the anomaly detection model 250 and determine that the embedded software application 212 is during the inactive execution period (e.g., at sampling point s)₄Thereafter) whether to present anomalous behavior of the embedded software application 212.

The anomaly detection engine 112 can determine that the embedded software application 212 exhibits anomalous behavior based on the anomalous behavior determination 260, which can include identifying one or more particular application tasks having anomalous activity. Once abnormal behavior is detected, the anomaly detection engine 112 can provide an anomaly alert, for example, to a central monitoring system of the physical system, a system operating system, or other logical entity configured to monitor the operation of the embedded software application 212 supported by the physical system. Thus, the anomaly detection engine 112 can support detecting anomalous application activity during runtime execution of embedded software.

As described herein, machine learning based anomaly detection features can be provided in a learning phase and a runtime phase. By sampling and training the model, particularly with both activity metrics and application parameters, the machine learning-based anomaly detection features described herein may provide an efficient and accurate mechanism by which anomalous activity in embedded software may be identified. By learning application behaviors based on these sampled aspects, the anomaly model training engine 110 and the anomaly detection engine 112 can identify anomalous behaviors that are not related to how malware permeates the system (e.g., do not need to identify forms of intrusion), and can further support detection of unidentified or previously unknown malware. Because the activity metrics are related to application parameters, the machine learning-based anomaly detection features described herein do not need to have a priori knowledge of the particular attack pattern or characteristics of the malware. Thus, the features described herein may provide application security with improved effectiveness and robustness. Moreover, task-specific exceptions are supported via a task-specific model, which may provide greater granularity and flexibility in identifying malware intrusions.

Fig. 5 illustrates an example of logic 500 that a system may implement to support learning phase training of an anomaly detection model. For example, system 100 may implement logic 500 as hardware, executable instructions stored on a machine-readable medium, or a combination of both. The system 100 may implement the logic 500 via the anomaly model training engine 110, and the system 100 may execute or perform the logic 500 by the anomaly model training engine 110 as a method of training an anomaly detection model for an embedded software application using machine learning. The following description of logic 500 is provided using anomaly model training engine 110 as an implementation example. However, various other implementation options of the system 100 are also possible.

In implementing the logic 500, the anomaly model training engine 110 may sample the embedded software application at selected sampling points to obtain activity metrics and application parameters for the embedded software application (502). The anomaly model training engine 110 may also generate training data based on the activity metrics and application parameters obtained for the selected sampling points (504), and build an anomaly detection model using the training data (506). The anomaly model training engine 110 can perform the described steps 502, 504, and 506 in any of the various ways described herein, including task-specific based ways. In this manner, the anomaly model training engine 110 may train an anomaly detection model, wherein the anomaly detection model is configured to determine whether the embedded software application exhibits anomalous behavior based on the activity metric inputs and the application parameter inputs.

FIG. 6 illustrates an example of logic 600 that a system may implement to support exception detection during runtime execution of embedded software. For example, system 100 may implement logic 600 as hardware, executable instructions stored on a machine-readable medium, or a combination of both. The system 100 may implement the logic 600 via the anomaly detection engine 112, and the system 100 may execute or perform the logic 600 via the anomaly detection engine 112 as a method of detecting anomalous behavior during runtime execution of an embedded software application. The following description of logic 600 is provided using anomaly detection engine 112 as an implementation example. However, various other implementation options of the system 100 are also possible.

In implementing the logic 600, the anomaly detection engine 112 may sample the embedded software application at a sampling point during runtime execution of the embedded software application (602). In doing so, the anomaly detection engine 112 can obtain activity metrics and application parameters during runtime execution. The anomaly detection engine 112 can then determine that the embedded software application entered an inactive execution period during runtime execution (604), for example by determining that the embedded software application completed execution of a scheduled application task or by identifying that computing resources of the embedded system have been idle or inactive.

In response to determining that the embedded software application has entered an inactive execution period, the anomaly detection engine 112 may access an anomaly detection model trained for the embedded software (606) and provide the sampled activity metrics and sampled application parameters during runtime execution as inputs to the anomaly detection model (608). The anomaly detection engine 112 can also determine whether the embedded software application exhibits anomalous behavior based on output from the anomaly detection model for the provided input (610).

Anomaly detection engine 112 can perform steps 602, 604, 606, 608, and 610 described in any of various ways described herein, including on a task-specific basis. In this manner, the anomaly detection engine 112 can detect anomalous application behavior during runtime execution of the embedded software application.

The logic shown in fig. 5 and 6 provides an example by which the system may support machine learning-based anomaly detection for embedded software applications. Additional or alternative steps are contemplated herein in logic 500 and/or logic 600, including any features described herein for the anomaly model training engine 110, the anomaly detection engine 112, or a combination of both.

FIG. 7 illustrates an example of a system 700 that supports machine learning-based anomaly detection for embedded software applications. The system 700 may include a processor 710, and the processor 710 may take the form of a single or multiple processors. The one or more processors 710 may include a central processing unit, microprocessor, or any hardware device suitable for executing instructions stored on a machine-readable medium. The system 700 may include a machine-readable medium 720. The machine-readable medium 720 may take the form of any non-transitory electronic, magnetic, optical, or other physical storage device that stores executable instructions, such as the anomaly model training instructions 722 and the anomaly detection instructions 724 shown in fig. 7. Thus, the machine-readable medium 720 may be, for example, Random Access Memory (RAM), such as Dynamic Random Access Memory (DRAM), flash memory, spin-transfer torque memory, electrically erasable programmable read-only memory (EEPROM), a storage drive, an optical disk, and so forth.

The system 700 may execute instructions stored on a machine-readable medium 720 via a processor 710. Execution of instructions (e.g., anomaly model training instructions 722 and anomaly detection instructions 724) may cause system 700 to perform any of the machine learning-based anomaly detection features described herein, including any feature according to the teachings of anomaly model training engine 110, anomaly detection engine 112, or a combination of both.

For example, execution of the anomaly model training instructions 722 by the processor 710 may cause the system 700 to sample the embedded software application at a given sampling point to obtain an activity metric for the embedded software application since a previous sampling point and application parameters for the embedded software application at the given sampling point. Execution of the anomaly model training instructions 722 by the processor 710 may also cause the system 700 to generate training data based on the activity metrics and application parameters obtained for a given sampling point, and to build an anomaly detection model using the training data. The constructed anomaly detection model may be configured to determine whether the embedded software application exhibits anomalous behavior based on the activity metric inputs and the application parameter inputs.

Execution of the anomaly detection instructions 724 by the processor 710 may cause the system 700 to sample the embedded software application at a given sampling point during runtime execution of the embedded software application to obtain activity metrics and application parameters during runtime execution and determine that the embedded software application entered an inactive execution period during runtime execution. Execution of the anomaly detection instructions 724 by the processor 710, in response, may also cause the system 700 to: accessing an anomaly detection model configured to determine whether an embedded software application exhibits anomalous behavior based on activity metric inputs and application parameter inputs; providing activity metrics sampled during runtime execution and sampled application parameters as inputs to an anomaly detection model; and determining whether the embedded software application exhibits abnormal behavior based on output from the anomaly detection model for the provided input.

Any additional or alternative features as described herein may be implemented via the anomaly model training instructions 722, the anomaly detection instructions 724, or a combination of both.

The systems, methods, apparatus, and logic described above, including the anomaly model training engine 110 and the anomaly detection engine 112, may be implemented in a variety of different ways and with a variety of different combinations of hardware, logic, circuitry, and executable instructions stored on a machine-readable medium. For example, the anomaly model training engine 110, the anomaly detection engine 112, or a combination thereof may comprise circuitry in a controller, microprocessor, or Application Specific Integrated Circuit (ASIC), or may be implemented with discrete logic or components, or may be implemented with a combination of other types of analog or digital circuitry combined on a single integrated circuit or distributed among multiple integrated circuits. An article, such as a computer program product, may include a storage medium and machine-readable instructions stored on the medium that, when executed in an endpoint, computer system, or other device, cause the device to perform operations in accordance with any of the above descriptions, including any features in accordance with the anomaly model training engine 110, the anomaly detection engine 112, or a combination thereof.

The processing capabilities of the systems, devices, and engines described herein (including the anomaly model training engine 110 and the anomaly detection engine 112) may be distributed among multiple system components, such as among multiple processors and memory, optionally including multiple distributed processing systems or cloud/network elements. Parameters, databases, and other data structures may be stored and managed separately, may be incorporated into a single memory or database, may be logically and physically organized in many different ways, and may be implemented in a variety of ways including data structures (e.g., linked lists), hash tables, or implicit storage mechanisms. The programs may be components of a single program (e.g., subroutines), separate programs, distributed across multiple memories and processors, or implemented in a number of different ways (e.g., as libraries (e.g., shared libraries)).

Although various examples have been described above, further embodiments are possible.

23页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：SSD中数据的选择性擦除

Machine learning based anomaly detection for embedded software applications

相关技术

网友询问留言