Binary program dynamic taint analysis method and device

文档序号:1952458 发布日期:2021-12-10 浏览:6次 中文

阅读说明:本技术 二进制程序动态污点分析方法及装置 (Binary program dynamic taint analysis method and device ) 是由 陈茂飞 汪来富 金华敏 王渭清 刘东鑫 于文良 于 2020-06-09 设计创作,主要内容包括:本发明提供一种二进制程序动态污点分析方法及装置。一种二进制程序动态污点分析方法,对基于容器运行的二进制程序进行动态污点分析,所述二进制程序动态污点分析方法包括:插桩步骤,插桩操作系统的系统调用;获取步骤,通过所插桩的所述系统调用来获取在容器内运行的目标程序的信息;加载步骤,基于所述目标程序的信息,将动态二进制程序分析框架Pin加载到所述目标程序;以及分析步骤,基于所述Pin,对所述目标程序进行动态污点分析。(The invention provides a method and a device for analyzing dynamic taint of a binary program. A binary program dynamic taint analysis method is used for carrying out dynamic taint analysis on a binary program operated based on a container, and comprises the following steps: a pile inserting step, namely, the system call of a pile inserting operation system; an acquisition step of acquiring information of a target program running in a container through the system call of the instrumented; a loading step, loading a dynamic binary program analysis frame Pin to the target program based on the information of the target program; and an analysis step, based on the Pin, performing dynamic taint analysis on the target program.)

1. A binary program dynamic taint analysis method is used for carrying out dynamic taint analysis on a binary program operated based on a container, and comprises the following steps:

a pile inserting step, namely, the system call of a pile inserting operation system;

an acquisition step of acquiring information of a target program running in a container through the system call of the instrumented;

a loading step, loading a dynamic binary program analysis frame Pin to the target program based on the information of the target program; and

and an analysis step, based on the Pin, performing dynamic taint analysis on the target program.

2. The binary program dynamic taint analysis method according to claim 1,

in the instrumentation step, instrumentation includes the system call and a custom code of a code for acquiring a return value of the system call.

3. The binary program dynamic taint analysis method according to claim 2,

in the instrumentation step, the original system call in the system call table is replaced by the custom code.

4. The binary program dynamic taint analysis method according to claim 2,

in the instrumentation step, the custom code is dynamically loaded as a kernel module to an operating system kernel.

5. The binary program dynamic taint analysis method according to any of claims 1 to 4,

in the instrumentation step, a parameter for acquiring information of the target program is specified in the system call.

6. The binary program dynamic taint analysis method according to any of claims 2 to 4,

in the acquiring step, when the target program runs in a container, the customized code is executed, and the information of the target program is acquired according to an execution result of the code for acquiring the return value of the system call.

7. The binary program dynamic taint analysis method according to any of claims 1 to 4,

the system call comprises: system calls for namespaces.

8. The binary program dynamic taint analysis method according to any of claims 1 to 4,

in the acquiring step, acquiring a global process number of the target program in an operating system,

and in the loading step, loading Pin to the target program according to the global process number of the target program.

9. The binary program dynamic taint analysis method according to claim 8,

in the acquiring step, a virtual Ethernet device of the target program is also acquired,

in the analyzing step, the dynamic taint analysis is performed by identifying data input from the virtual ethernet device as taint data according to the virtual ethernet device.

10. The binary program dynamic taint analysis method according to claim 8,

in the analyzing step, a stub read system call is inserted into the target program, and data read from a disk is identified as taint data to perform dynamic taint analysis.

11. A binary program dynamic taint analysis device, comprising:

a pile-inserting unit for inserting a system call of an operating system;

an acquisition unit that acquires information of a target program that runs in a container through the system call of the instrumented pile;

the loading unit loads a dynamic binary program analysis frame Pin to the target program based on the information of the target program; and

and the analysis unit is used for carrying out dynamic taint analysis on the target program based on the Pin.

12. A computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the binary program dynamic taint analysis method of any one of claims 1 to 10.

Technical Field

The invention relates to a technology for performing dynamic taint analysis on a binary program, in particular to a technology for performing dynamic taint analysis on a binary program operated based on a container.

Background

The taint analysis is a key technology of dynamic program analysis and is widely applied to the fields of malicious program analysis, software testing and the like.

The existing binary program dynamic taint analysis method comprises the following steps: in step S1, based on the dynamic binary program analysis framework (Pin), the input from the external channel in the main memory and the register is identified and marked as a suspected contamination source; in step S2, the suspicious pollution source propagation behavior is tracked through the intermediate instruction layer, and the taint behavior is analyzed according to the taint state of each instruction operand; in step S3, a memory model and a register model are created to record the dirty status of each main memory byte and register byte (see patent document: application publication No. CN 107526970A).

Pin is a dynamic binary program analysis framework developed by Intel and supports dynamic analysis of binary programs running on an x86 architecture.

The container is a lightweight virtualization technology and is applied more and more widely under the trend of cloud computing. However, since the container isolates the programs running therein, Pin does not support dynamic taint analysis of the binary programs running on the container, and Pin cannot run in the container to analyze the target program running in the container.

Disclosure of Invention

In view of the above, the present invention is directed to a method and apparatus for performing dynamic taint analysis on a binary program running on a container.

According to an aspect of the present invention, there is provided a binary program dynamic taint analysis method for performing dynamic taint analysis on a binary program operating on a container basis, the binary program dynamic taint analysis method comprising:

a pile inserting step, namely, the system call of a pile inserting operation system;

an acquisition step of acquiring information of a target program running in a container through the system call of the instrumented;

a loading step, loading a dynamic binary program analysis frame Pin to the target program based on the information of the target program; and

and an analysis step, based on the Pin, performing dynamic taint analysis on the target program.

Preferably, in the instrumentation step, instrumentation includes the system call and a custom code of a code for acquiring a return value of the system call.

Preferably, in the instrumentation step, the original system call in the system call table is replaced with the custom code.

Preferably, in the instrumentation step, the custom code is dynamically loaded as a kernel module to an operating system kernel.

Preferably, in the instrumentation step, a parameter for acquiring information of the target program is specified in the system call.

Preferably, in the acquiring, when the target program runs in a container, the custom code is executed, and information of the target program is acquired according to an execution result of the code for acquiring the return value of the system call.

Preferably, the system call includes: system calls for namespaces.

Preferably, in the obtaining step, a global process number of the target program in an operating system is obtained, and in the loading step, Pin is loaded to the target program according to the global process number of the target program.

Preferably, in the acquiring step, a virtual ethernet device of the target program is further acquired, and in the analyzing step, the dynamic taint analysis is performed by identifying data input from the virtual ethernet device as taint data according to the virtual ethernet device.

Preferably, in the analyzing step, a stub read system call is inserted in the target program, and data read from a disk is identified as taint data to perform dynamic taint analysis.

According to another aspect of the present invention, there is provided a binary program dynamic taint analysis apparatus, comprising:

a pile-inserting unit for inserting a system call of an operating system;

an acquisition unit that acquires information of a target program that runs in a container through the system call of the instrumented pile;

the loading unit loads a dynamic binary program analysis frame Pin to the target program based on the information of the target program; and

and the analysis unit is used for carrying out dynamic taint analysis on the target program based on the Pin.

According to yet another aspect of the present invention, there is provided a computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-described binary program dynamic taint analysis method.

According to the method and the device, the information of the target program running in the container is obtained through system calling of the instrumentation operating system, the Pin is loaded to the target program based on the obtained information of the target program, and then dynamic taint analysis is carried out on the target program based on the Pin, so that the method and the device can carry out dynamic taint analysis on the binary program which runs based on the container and is not limited to programming language, and the boundary of a dynamic taint program analysis tool is greatly expanded, which is not possessed by the existing dynamic program analysis framework.

Drawings

FIG. 1 is a schematic diagram of the container function implemented by Namespace.

FIG. 2 is a flow diagram illustrating dynamic taint analysis of a binary program running on a container basis according to an embodiment of the present invention.

FIG. 3 is a block diagram illustrating a binary program dynamic taint analysis apparatus of an embodiment of the present invention.

Detailed Description

The present invention now will be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown.

LXC is a shorthand for Linux Container, a kernel virtualization technology that can provide lightweight virtualization to isolate processes and resources. LXC includes Namespace (Namespace), Cgroup, and the like. The Linux Namespace mechanism provides a resource isolation scheme. The Namespace encapsulates the global resource of the kernel, so that each Namespace has an independent resource, and therefore, the use of the same resource by different processes in the respective Namespace is not interfered mutually. Cgroup is a short name for Control group, and is a feature provided by the Linux kernel and used for limiting and isolating the use of system resources by a group of processes. FIG. 1 is a schematic diagram of the container function realized by Namespace.

The current Namespace of Linux kernel provides 6 isolation capabilities as shown in the following table 1:

table 1:

linux mainly realizes Namespace through 3 system calls, namely clone (), setns (), unshare (). To determine which Namespace is the one that is sequestered, some call parameters, such as CLONE _ NEWIPC, CLONE _ NEWNET, CLONE _ NEWNS, CLONE _ NEWPID, CLONE _ NEWUSER, CLONE _ NEWUTS, and CLONE _ NEWCGROUP, typically need to be specified when using these system calls.

For example, when a system call CLONE () is run, the parameter CLONE _ NEWPID is specified, a child process is created, which is run in a new Namespace, and when the CLONE () is run, the global process number of the child process is returned. The instruction code when the system call clone () is executed is, for example, as follows:

child_pid=clone(childFunc,*,CLONE_NEWPID|SIGCHLD,NULL);

the specific usage of the 3 system calls clone (), setns (), and unshare related to the prior art, and the detailed description thereof is omitted here.

The following describes an embodiment of the present invention for dynamic taint analysis based on a binary program run by a Linux container.

FIG. 2 illustrates a flow diagram for dynamic taint analysis of a binary program run on a container basis according to an embodiment of the present invention.

In step S101, a system call of the operating system is instrumented.

In this embodiment, a Linux operating system is taken as an example for explanation, and the system call of the operating system may include: the system calls related to the Namespace implementation in the Linux kernel are namely three system calls of clone (), setns (), and unshare ().

In order to acquire information of a target program by a instrumented system call, at the time of the instrumented system call, a return value of the system call may be acquired by inserting a code for acquiring the return value of the system call. In addition, corresponding parameters can be specified in the system call according to the information of the target program acquired as required.

As a method of the instrumentation system call, the following method may be adopted: and writing custom code, wherein the custom code comprises a system call and code for acquiring a return value of the system call. And then modifying the system call table, and replacing the original system call in the system call table with the customized code. Thus, when the target program is started, the Linux kernel is recompiled and loaded to execute the customized code, at this time, the system call in the customized code is executed, and the code for acquiring the return value of the system call is executed to obtain the return value of the system call.

As an alternative to modifying the system call table, the customized code may also be written in a Linux Kernel Module (Linux Kernel Module), and the customized code is loaded or unloaded as required. For example, when the information of the target program needs to be acquired, the customized code is loaded to the Linux kernel to run, and for example, when the information of the target program does not need to be acquired, the customized code is unloaded from the Linux kernel. In this way, the functions of dynamically loading and unloading the custom code can be realized.

In the present embodiment, the method of the instrumentation system call is not limited to the above-described method as long as information of the target program running in the container can be acquired by the instrumentation system call.

In step S102, information of the target program running in the container is acquired through the instrumented system call. Wherein the target program is a binary program to be subjected to dynamic taint analysis. When the target program in the container is started, the instrumented system call is executed in step S101, and the information of the target program can be acquired from the return value of the system call.

In this embodiment, the information of the target program that can be acquired may include:

a global process number (PID) of the target program;

a virtual Ethernet device (Network) generated in a namespace number of a container in which the target program is located;

a user ID and a user group ID (user) in a container in which the target program is located;

a host name and a domain name (UTS) within a container in which the target program is located.

The information of the target program that can be obtained is not limited to the above-listed information, and can be increased or decreased according to the dynamic taint analysis requirements. As an embodiment, according to the information that needs to be acquired, corresponding parameters may be specified when the stub system is called in step S101.

Here, a process number of the target program will be described. Because of the isolation of the container, a program running on the basis of the container is given a plurality of process numbers, one is a process number in the container and can be called a local or local process number; one is an operating system based process number, which may be referred to as a global process number. Due to the isolation of the container, the process number of the target program read inside the container is only the local process number. The invention can acquire the global process number of the target program in the operating system through instrumentation and Namespace-related system call.

In step S103, a dynamic binary program analysis framework (Pin) is loaded to the target program based on the information of the target program acquired in the step S102.

As an implementation manner of loading Pin to the target program, for example, Pin may be loaded to the target program according to the acquired global process number of the target program.

For example, the command to load Pin into the target program may be Pin-pid < Global Process number >. Wherein Pin is a dynamic program analysis framework binary file provided by Intel, and pid is a global process number of the target program.

In this embodiment, the global process number of the target program in the operating system is acquired through steps S101 and S102, and therefore although the target program is isolated by the container in the operating system, Pin may be loaded to the target program running in the container based on the global process number of the target program in the operating system.

In addition, as another embodiment, in step S102, the namespace number of the container where the target program is located and the local process number of the target program in the container may also be obtained without obtaining the global process number of the target program, and Pin may be loaded to the target process according to the namespace number of the container where the target program is located and the local process number of the target program in the container.

In step S104, a dynamic taint analysis is performed on the target program based on the Pin.

After Pin is loaded into the target program, the method for performing dynamic taint analysis of the target program based on Pin may be performed in any manner known in the art.

The following describes a method for acquiring information of a target program in a container, taking an example in which a pinning system calls a clone () to acquire a global process number of the target program in the container. In order to obtain the global process number of the target process, for example, a system call clone () may be instrumented to obtain a return value of the clone (), so that the global process number of the target process may be obtained as follows:

writing custom clone () system call code. The custom code written may include the original clone () code and the code for obtaining the return value of clone (). As a code for acquiring the return value of a clone (), since the return value is usually held in the eax register, the return value of a clone () is printed out by a printk () function, for example. As an example of the written custom code, the custom code may be newly named custom _ clone (), and the custom _ clone () may be as follows:

after the custom code is written, a system call table (system call table) is modified, and the original clone () function is replaced with the above written custom code, in this example, with a custom _ clone () function. The instructions to modify the system call table are as follows:

original_call=sys_call_table[__NR_clone];

sys_call_table[__NR_clone]=custom_clone;

through the above instructions, the original clone () function is replaced with a custom _ clone () function in the system call table.

And recompiling the Linux kernel and loading the kernel, so that when the target program running based on the container is started, executing the custom _ clone () function according to the modified system call table to acquire the global process number of the target program.

In addition, as described above, the customized code may also be written in a Linux Kernel Module manner, so as to implement the function of dynamic loading/unloading.

In addition, according to the same method, the global process number of the target program can be acquired by instrumentation of setns () or unshare () system calls related to Namespace, and a detailed description thereof is omitted. And selecting proper system call according to the application scene to acquire the required information.

The same method may be employed for acquiring other information of the target program by the instrumentation system call, and will not be described in detail here.

In the following, a specific example of performing dynamic taint analysis on an object program running in a container based on Pin is described. For example, a virtual ethernet device of a target program may be obtained through a instrumented system call, and a dynamic taint analysis may be performed based on the obtained virtual ethernet device by identifying data input from the virtual ethernet device as taint data. As another example, a stub read system call may be inserted into the target program, and the data read from the disk may be identified as taint data to perform dynamic taint analysis.

For example, after the global process number of the target program in the container is acquired, a corresponding pintool module needs to be written and compiled to implement a specific taint analysis function for the target program, wherein the pintool is an analysis module written based on an API provided by Pin.

An example of writing pintool to track a taint data stream of a target program is as follows:

1) the pin provides an API when the instrumentation target program running system calls:

·LEVEL_PINCLIENT::SYSCALL_ENTRY_CALLBACK

·LEVEL_PINCLIENT::SYSCALL_EXIT_CALLBACK

the former is instrumentation when the target program starts a system call, and the latter is instrumentation when the system call is completed.

Using API to obtain the calling number when the target program runs the system call, when the calling number is read system call:

and acquiring the 2 nd and 3 rd parameters called by the read system, obtaining the address space range of the input data read into the memory, and marking the address space range as taint data.

The target program may process the input data while it continues to run, such as by reading the input data into a register using a load/store instruction, or by storing the input data to another memory space address.

2) The pin also provides instrumentation API for the instruction (instruction) level:

LEVEL_PINCLIENT::INS_INSTRUMENT_CALLBACK

the ping tool code is written using the provided API to track the execution of instructions such as load/store, by marking the output data as dirty data if the input data to the instruction is marked as dirty data.

Based on 1) and 2) and other required steps to write, compile pintool, and load using commands at analysis: pin-t pintool-pid < global process number >, the stain data flow tracing of the target program in the container can be realized. It should be noted that the loading of the Pin to the target program in step S103 and the analysis of the target program based on the Pin in step S104 may be simultaneously realized by a command of Pin-t probe-pid < global process number >.

The present invention is not limited to the above-mentioned examples of dynamic taint analysis, and the method of performing dynamic taint analysis based on Pin can adopt the existing techniques, and will not be described in detail herein.

FIG. 3 is a block diagram illustrating a binary program dynamic taint analysis apparatus of an embodiment of the present invention. As shown in fig. 3, the binary program dynamic taint analysis apparatus of the embodiment of the present invention includes: a instrumentation unit 201 that performs system call of an instrumentation operation system; an acquisition unit 202 that acquires information of a target program running in a container through the system call of the instrumented; a loading unit 203, loading a dynamic binary program analysis framework Pin to the target program based on the information of the target program; and the analysis unit 204 is used for carrying out dynamic taint analysis on the target program based on the Pin. The binary program dynamic taint analysis device according to the embodiment of the present invention may be implemented by a processing circuit, and each unit of the instrumentation unit 201, the obtaining unit 202, the loading unit 203, the analysis unit 204, and the like is only a logic module divided according to the specific function implemented, and is not used to limit the specific implementation. In actual implementation, the above units may be implemented as separate physical entities, or may be implemented by a single entity (e.g., a processor (CPU or DSP, etc.), an integrated circuit, etc.).

In other embodiments, a computer-readable storage medium has stored thereon computer program instructions, which when executed by a processor, implement the steps of the method of the corresponding embodiment of fig. 2. As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, apparatus, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

In the above embodiment, the Linux container technology based on the Linux operating system is taken as an example to describe the method for performing dynamic taint analysis on the binary program running on the basis of the container, but the present invention is not limited to this, and the binary program running on the basis of the container in other operating systems such as Windows and MacOS can also apply the binary program dynamic taint analysis method according to the embodiment of the present invention. Windows and MacOS are commercial software, non-open source operating systems, but similar system call mechanisms are used as well, providing corresponding system calls. Therefore, the binary program dynamic taint analysis method of the embodiment of the invention can be applied to other operating systems as well.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the market technology, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

11页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:回归测试方法、装置及电子设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!