Buffer overflow attack defense method and device based on RISC-V and Canary mechanism

文档序号:136097 发布日期:2021-10-22 浏览:64次 中文

阅读说明:本技术 基于RISC-V与Canary机制的缓冲区溢出攻击防御方法及装置 (Buffer overflow attack defense method and device based on RISC-V and Canary mechanism ) 是由 刘畅 赵琛 武延军 芮志清 吴敬征 于 2021-07-16 设计创作,主要内容包括:本发明公开一种基于RISC-V与Canary机制的缓冲区溢出攻击防御方法及装置,包括:为程序源代码的抽象语法树赋予content属性后,生成节点调用关系图G;生成特殊数据Canary;在每一调用节点content属性所对应的代码语句之前及每一返回节点content属性所对应的代码语句之前,分别插入设置特殊数据Canary的RISC-V扩展指令与检验特殊数据Canary的RISC-V扩展指令;执行程序源代码,设置特殊数据Canary的RISC-V扩展指令将特殊数据Canary写入当前栈帧,检验特殊数据Canary的RISC-V扩展指令通过特殊数据Canary的值p与从当前栈帧中取得特殊数据Canary的值p′的对比结果,以进行防御。本发明全面覆盖堆溢出、栈溢出、BSS溢出等多种缓冲区溢出形式,可实现安全防御的软硬协同,对系统性能的影响较小,获取了更佳的防御效果。(The invention discloses a buffer overflow attack defense method and a device based on RISC-V and Canary mechanisms, comprising the following steps: after the content attribute is given to the abstract syntax tree of the program source code, a node calling relation graph G is generated; generating special data Canary; before the code statement corresponding to each calling node content attribute and before the code statement corresponding to each returning node content attribute, respectively inserting a RISC-V extended instruction for setting special data Canary and a RISC-V extended instruction for checking the special data Canary; executing a program source code, setting a RISC-V extension instruction of the special data Canary to write the special data Canary into the current stack frame, and checking a comparison result of a value p of the special data Canary and a value p' of the special data Canary obtained from the current stack frame by the RISC-V extension instruction of the special data Canary so as to defend. The invention can comprehensively cover various buffer overflow forms such as heap overflow, stack overflow, BSS overflow and the like, can realize soft and hard cooperation of security defense, has small influence on system performance and obtains better defense effect.)

1. A buffer overflow attack defense method based on RISC-V and Canary mechanisms comprises the following steps:

1) generating an abstract syntax tree according to a program source code, giving a content attribute to each node of the abstract syntax tree, and generating a node calling relation graph G ═ (V)c,VRR, neighbor), wherein the attribute value of the content attribute corresponds to the code statement represented by the node, VCIs a function call node set consisting of a number of function call nodes, VRThe function return node set is composed of a plurality of function return nodes, R is a corresponding relation set between a function calling node and the function return nodes, and the neighbor is a program entry node;

2) generating a special data Canary arranged at the boundary of a buffer area by utilizing a PUF (physical unclonable function), a code statement corresponding to the content attribute of each node and a program entry node (ntry);

3) before the code statement corresponding to each calling node content attribute and before the code statement corresponding to each returning node content attribute, respectively inserting a RISC-V extended instruction for setting special data Canary and a RISC-V extended instruction for checking the special data Canary;

4) executing a program source code, writing the value of the special data Canary into the bottom of the current stack frame by using a RISC-V expansion instruction for setting the special data Canary, and comparing the value p of the special data Canary with the value p' of the special data Canary obtained from the current stack frame by checking the RISC-V expansion instruction of the special data Canary;

5) if the value p is different from the value p', exception handling is entered for defense.

2. The method of claim 1, wherein the node call graph G is generated by:

1) assigning a type attribute to each node of the abstract syntax tree, wherein the attribute value of the type attribute corresponds to the judgment of the code represented by the node;

2) finding out all statement nodes representing function call according to the type attribute to obtain a function call node set VC

3) Finding out all function return statement nodes according to the type attribute to obtain a function return node set VR

4) By judging the function call node set VcAnd function return node set VRWhether a control flow path exists between any group of calling-returning node pairs or not is judged, and a corresponding relation set R is obtained;

5) using function call node set VCFunction return node set VRAnd generating a node calling relation graph G by the corresponding relation set R and the program entry node nentry of the abstract syntax tree.

3. The method of claim 1, wherein the special data Canary is generated by:

1) acquiring a program initial address provaddr by using a program entry node neighbor and the content attribute of each node;

2) inputting a specific stimulation signal according to the hardware security primitive PUF to obtain a unique output value PUF;

3) the program start address progaddr is combined with the output value puf to generate special data Canary.

4. The method of claim 1, wherein a RISC-V extended instruction to set a special data Canary is inserted by:

1) from a set of function call nodes VCTaking a function calling node which is not taken under the current control flow;

2) judging whether a function calling node which is not taken has inserted a RISC-V extended instruction for setting special data Canary before the function calling node according to the execution sequence of the current control flow; if yes, turning to the step 3); if not, go to step 4)

3) If not, inserting a RISC-V extended instruction for setting special data Canary before the code statement corresponding to the content attribute of the function calling node which is not taken;

4) judging function call node set VCIf the function calling nodes which are not obtained exist, the step 1) is carried out, and the RISC-V extended instruction with special data Canary is inserted until each function calling node.

5. The method of claim 1, wherein setting the instruction format of the special data Canary RISC-V extended instruction comprises:

1) an Opcode field to represent an instruction Opcode encoding in the custom encoding space;

2) the Src field, which indicates the stack frame address to which the canvas value is currently to be written;

3) a Canary field indicating the address of the private security register from which Canary originates;

4) the Res field indicates the address of the register storing the instruction execution result.

6. The method of claim 1, wherein a RISC-V extended instruction to check for a special data Canary is inserted by:

1) returning a set of nodes V from a functionRTaking a function which is not taken under the current control flow to return to the node;

2) judging whether the function return node which is not taken has inserted a RISC-V extended instruction for checking special data Canary before the function return node according to the execution sequence of the current control flow, if so, turning to 4); if not, go to 3);

3) before the code statement corresponding to the content attribute of the function return node which is not taken, inserting a RISC-V extended instruction for checking special data Canary;

4) judging function return node setVRIf not, the function return nodes which are not fetched exist, if so, the step 1) is carried out, and RISC-V expansion instructions for checking special data Canary are inserted until each function return node.

7. The method of claim 1, wherein checking the instruction format of the special data Canary RISC-V extended instruction comprises:

1) an Opcode field to represent an instruction Opcode encoding in the custom encoding space;

2) a Src field indicating a start address of a currently protected code segment;

3) a Canary field indicating an expected value of Canary;

4) the Res field indicates the address of the register storing the instruction execution result.

8. The method of claim 1, wherein the defense against a buffer overflow attack is performed by:

1) acquiring a value p' of the special data Canary from a current stack frame by using a RISC-V extended instruction for inspecting the special data Canary;

2) acquiring a value p of special data Canary;

3) comparing the value p' of the special data Canary with the value p of the special data Canary: if the two functions are the same, the function returns normally; if not, the abnormal function calling node is thrown out, and the program is terminated.

9. A storage medium having a computer program stored thereon, wherein the computer program is arranged to, when executed, perform the method according to any of claims 1-8.

10. An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the method according to any of claims 1-8.

Technical Field

The invention belongs to the technical field of computers, and relates to a buffer overflow attack defense method and device based on RISC-V and Canary mechanisms.

Background

With the development of the computer industry, computer software becomes an indispensable part of production and life, including the fields of medical treatment, education, military, politics, new retail and the like, and computer systems are widely applied in all industries. With the rapid development and popularization of computer systems, how to guarantee the credibility of the behaviors and protect the behaviors from malicious attacks has become an important issue of common attention in both academic circles and industrial circles. The buffer overflow attack is a common malicious attack means, and breaks through the capacity limit of the buffer by utilizing the deficiency of mechanisms such as boundary check in a source program written by an internal memory insecure language and covers the data content of other areas, thereby destroying the correctness and stability of the program. Due to the convenience of operation, such attack means are widely used in attacks such as program control flow hijacking.

To effectively cope with buffer overflow attacks against program control flow, the academia has been exploring new defense methods. In view of the fact that buffer overflow attacks may break through the buffer boundary in a formal manner, and write data into a memory region outside the buffer, some defense methods based on the Canary mechanism have been proposed. A Canary is a piece of special data placed at the boundary of the buffer, also called "Canary", "referrer", etc. When overflow occurs, the data filled by the malicious attack can be covered by the Canary firstly, so that the content of the Canary is changed; before executing instructions such as program return and the like which are positioned outside the buffer, whether the Canary is changed or not is checked, namely whether overflow occurs or not can be judged, and the program is prevented from being guided to execute to an incorrect code fragment. However, the existing defense method based on the Canary is greatly dependent on the hardware environment and the system characteristics, and cannot be directly migrated to the RISC-V system. For example, the stack guard defense method designed by Crispin Cowan et al is based on the adaptation of the gcc compiler, in which function _ prologue and function _ epilogue functions are augmented with changes specific to the x86 architecture to arrange the placement and verification process of the Canary. In the above example, the dependence on hardware characteristics results in a limited applicability of the solution and a difficult generalization to migrate to other existing systems, including RISC-V systems.

In addition, chinese patent application CN112948818A discloses a protection method and system for preventing stack overflow attack, and proposes a method for constructing a Canary word and a method for using the same. However, this method is only applicable to stack overflow scenarios caused by lack of necessary boundary checking for string processing, and lacks equally effective protection for other forms of buffer overflow, BSS (Block Started by Symbol) overflow, and overflow scenarios caused by other reasons. Meanwhile, the method needs an additional operation process to construct the Canary word, and has certain influence on the system operation efficiency.

The invention provides an implementation scheme based on a RISC-V extended instruction set, aiming at solving the problems of limited application range and difficult landing of a buffer overflow attack defense method based on a Canary mechanism. The invention can find the buffer overflow in time, prevent the control flow from entering the tampered memory area, prevent the program from executing based on wrong instructions and data, improve the defense capability of the system to the buffer overflow attack, and further improve the safety of the RISC-V operating system.

Disclosure of Invention

The invention aims to provide a buffer attack defense method and a buffer attack defense device based on a Canary mechanism, which can be applied to a RISC-V system. The method monitors write activity across buffer boundaries by setting a special Canary at the buffer boundaries, preventing programs from executing based on erroneous instructions and data. The method can effectively prevent the attack of hijacking the control flow by tampering the program return address by using the buffer overflow, and improve the safety of the RISC-V system.

In order to achieve the purpose, the invention adopts the following technical scheme:

a buffer overflow attack defense method based on RISC-V and Canary mechanisms comprises the following steps:

1) generating an abstract syntax tree according to a program source code, giving a content attribute to each node of the abstract syntax tree, and generating a node calling relation graph G ═ (V)C,VRR, neighbor), wherein the attribute value of the content attribute corresponds to the code statement represented by the node, VCIs a function call node set consisting of a number of function call nodes, VRThe function return node set is composed of a plurality of function return nodes, R is a corresponding relation set between a function calling node and the function return nodes, and the neighbor is a program entry node;

2) generating a special data Canary arranged at the boundary of a buffer area by utilizing a PUF (physical unclonable function), a code statement corresponding to the content attribute of each node and a program entry node (ntry);

3) before the code statement corresponding to each calling node content attribute and before the code statement corresponding to each returning node content attribute, respectively inserting a RISC-V extended instruction for setting special data Canary and a RISC-V extended instruction for checking the special data Canary;

4) executing a program source code, writing the value of the special data Canary into the bottom of the current stack frame by using a RISC-V expansion instruction for setting the special data Canary, and comparing the value p of the special data Canary with the value p' of the special data Canary obtained from the current stack frame by checking the RISC-V expansion instruction of the special data Canary;

5) if the value p is different from the value p', exception handling is entered for defense.

Further, a node call relationship graph G is generated by:

1) assigning a type attribute to each node of the abstract syntax tree, wherein the attribute value of the type attribute corresponds to the judgment of the code represented by the node;

2) finding out all statement nodes representing function call according to the type attribute to obtain a function call node set VC

3) Finding out all function return statement nodes according to the type attribute to obtain a function return node set VR

4) By judging the function call node set VCAnd function return node set VRWhether a control flow path exists between any group of calling-returning node pairs or not is judged, and a corresponding relation set R is obtained;

5) using function call node set VCFunction return node set VRAnd generating a node calling relation graph G by the corresponding relation set R and the program entry node nentry of the abstract syntax tree.

Further, special data, Canary, is generated by:

1) acquiring a program initial address provaddr by using a program entry node neighbor and the content attribute of each node;

2) inputting a specific stimulation signal according to the hardware security primitive PUF to obtain a unique output value PUF;

3) the program start address progaddr is combined with the output value puf to generate special data Canary.

Further, the specific stimulation signals include: an electromagnetic oscillation signal.

Further, a RISC-V extended instruction to set a special data Canary is inserted by the following steps:

1) from a set of function call nodes VCTaking a function calling node which is not taken under the current control flow;

2) judging whether a function calling node which is not taken has inserted a RISC-V extended instruction for setting special data Canary before the function calling node according to the execution sequence of the current control flow; if yes, turning to the step 3); if not, go to step 4)

3) If not, inserting a RISC-V extended instruction for setting special data Canary before the code statement corresponding to the content attribute of the function calling node which is not taken;

4) judging function call node set VCIf the function calling nodes which are not obtained exist, the step 1) is carried out, and the RISC-V extended instruction with special data Canary is inserted until each function calling node.

Further, setting the instruction format of the RISC-V extended instruction of the special data Canary includes:

1) an Opcode field to represent an instruction Opcode encoding in the custom encoding space;

2) the Src field, which indicates the stack frame address to which the canvas value is currently to be written;

3) a Canary field indicating the address of the private security register from which Canary originates;

4) the Res field indicates the address of the register storing the instruction execution result.

Further, a RISC-V extended instruction to check a special data Canary is inserted by:

1) returning a set of nodes V from a functionRTaking a function which is not taken under the current control flow to return to the node;

2) judging whether the function return node which is not taken has inserted a RISC-V extended instruction for checking special data Canary before the function return node according to the execution sequence of the current control flow, if so, turning to 4); if not, go to 3);

3) before the code statement corresponding to the content attribute of the function return node which is not taken, inserting a RISC-V extended instruction for checking special data Canary;

4) judging function return node set VRIf not, the function return nodes which are not fetched exist, if so, the step 1) is carried out, and RISC-V expansion instructions for checking special data Canary are inserted until each function return node.

Further, checking the instruction format of the special data Canary RISC-V extended instruction includes:

1) an Opcode field to represent an instruction Opcode encoding in the custom encoding space;

2) a Src field indicating a start address of a currently protected code segment;

3) a Canary field indicating an expected value of Canary;

4) the Res field indicates the address of the register storing the instruction execution result.

Further, the defense against buffer overflow attacks is performed by:

1) acquiring a value p' of the special data Canary from a current stack frame by using a RISC-V extended instruction for inspecting the special data Canary;

2) acquiring a value p of special data Canary;

3) comparing the value p' of the special data Canary with the value p of the special data Canary: if the two functions are the same, the function returns normally; if not, the abnormal function calling node is thrown out, and the program is terminated.

A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the above method when executed.

An electronic device comprising a memory and a processor, wherein the memory stores a program that performs the above described method.

Compared with the prior art, the invention has the following advantages:

1. by completely analyzing the program source code, various buffer overflow forms such as heap overflow, stack overflow, BSS overflow and the like are completely covered, and the method is a protection scheme with more universal applicability.

2. The method for generating the Canary word based on hardware methods such as PUF security primitives is more efficient and has less influence on system performance compared with the traditional software method.

3. Based on RISC-V extended instruction design, the hardware can be brought into a defense system, and the soft-hard cooperation of safety defense is realized. RISC-V secure hardware customized around RISC-V extended instructions may provide better defense.

Drawings

FIG. 1 is a flow chart of a buffer overflow defense method based on a Canary mechanism in a RISC-V system.

FIG. 2 is a flow diagram of a call relationship diagram for a program source code generating node.

Fig. 3 is a flow diagram of generating a Canary.

FIG. 4 is a flow diagram of a set Canary instruction being inserted into a piece of code.

FIG. 5 is a flow diagram of inserting a check Canary instruction into a code fragment.

FIG. 6 is a flow diagram of setting a Canary before entering a buffer.

FIG. 7 is a flow diagram of checking Canary when leaving a buffer.

FIG. 8 is a schematic diagram of the instruction format design for a RISC-V extended instruction for setting a Canary.

FIG. 9 is a schematic diagram of an instruction format design for checking the RISC-V extended instruction of Canary.

Detailed Description

The invention will be further described with reference to the accompanying drawings.

In the present embodiment, a general flow of the method for defending against a cache attack based on a Canary mechanism is shown in fig. 1, and the method mainly includes the following steps:

1) analyzing program source codes, generating a node calling relation graph, and determining the specific instruction positions of each function call and each function return, wherein the function call is identified as a call instruction, and the function return is identified as a ret instruction. The node call relation graph is a joint data structure integrating various node sets and node relation sets, and the structure G is (V)C,VRR, nentry), where V isCIs a collection of function call nodes, VRThe function return node is a set of function return nodes, R is a set of corresponding relations between function calls and return nodes, and the ntry is an entry node of the program. The flow is shown in fig. 2, and is specifically described as follows:

1a) generating an abstract syntax tree for a program source code, and endowing each node of the abstract syntax tree with a content attribute, wherein the attribute value corresponds to a code statement represented by the node; and (3) assigning a type attribute to each node, wherein the attribute value corresponds to the judgment of the code represented by the node, and the value ranges { callstate, retstate, others } respectively represent calling statements, returning statements and careless codes, and turn to 1 b).

1b) And finding out all function call statement nodes in the abstract syntax tree according to the type attribute of each node, recording the nodes as a call node set, and turning to 1 c).

1c) And finding out all function return statement nodes in the abstract syntax tree according to the type attribute of each node, recording the nodes as a return node set, and turning to 1 d).

1d) And judging whether a control flow path exists between any group of 'call-return' node pairs or not according to the nodes found in 1b) and 1c), recording the node pairs with the control flow paths into a corresponding relation set R, and turning to 1 e).

1e) Combining the calling node set recorded in 1b), the return node set recorded in 1c) and the corresponding relation set R recorded in 1d) with the program entry node in the abstract syntax tree to generate a node calling relation graph corresponding to the program source code.

2) A Canary is generated, wherein the Canary is to be generated in accordance with a Physical Unclonable Function (PUF) together with the start address of the program. The PUF is a hardware security primitive implemented by relying on chip features, has uniqueness, randomness and unpredictability, is also called as a "digital fingerprint" of hardware, and has been widely used in security scenarios such as AI asset protection. The flow is shown in fig. 3, and is specifically described as follows:

2a) according to the program entry node recorded in the node call relation graph generated in 1), obtaining the initial address of the program through the code statement recorded by the node content attribute, and turning to 2 b).

2b) Using PUF hardware security primitives, an output value corresponding to a particular stimulus signal (e.g., electromagnetic oscillation) is obtained, leading to 2 c).

2c) Combine the program start address obtained in 2a) with the PUF output value obtained in 2b) to generate a Canary, go to 2 d).

2d) Record the Canary generated in 2 c).

3) Before each function call, inserting a RISC-V extended instruction set by a Canary, where a control flow will be moved into a stack buffer when the function call is performed, where the inserted Canary is located at the boundary of the buffer, and separating the relevant content of the function execution (located in the stack buffer) from the function return address (located outside the buffer), the flow of which is shown in fig. 4, and the instruction format of the RISC-V extended instruction for setting the Canary is shown in fig. 8, which is described in detail as follows:

3a) and taking a calling node which is not taken under the current control flow from the function calling node set, and turning to 3 b).

3b) Judging whether a call node in 3a) inserts a Canary instruction before the call node according to the execution sequence of the current control flow, and if so, turning to 3 d); if not, go to 3 c).

3c) Insert an instruction to set the cancer before the statement recorded by the calling node content attribute in 3a), go to 3 d).

3d) And judging whether nodes which are not obtained exist in the function calling node set or not, if so, turning to 3a) to continue processing the subsequent nodes.

In the step, the Canary setting operation is realized by using a customized RISC-V-based extended instruction. The customized RISC-V-based extended instruction realizes the setting operation of the Canary, and the customized instruction format comprises the following steps:

an Opcode field to represent an instruction Opcode encoding in the custom encoding space;

the Src field, which indicates the stack frame address to which the canvas value is currently to be written;

a Canary field indicating the address of the private security register from which Canary originates;

the Res field indicates the address of the register storing the instruction execution result.

4) Before each function returns, a RISC-V extended instruction for checking the Canary is inserted, the flow of which is shown in FIG. 5, and the instruction format for checking the RISC-V extended instruction for checking the Canary is shown in FIG. 9, which is described in detail as follows:

4a) and taking an un-taken return node of the current control flow from the function return node set, and turning to 4 b).

4b) Judging whether the return node in the 4a) inserts a Canary checking instruction before the return node according to the execution sequence of the current control flow, if so, turning to the 4 d); if not, go to 4 c).

4c) Insert an instruction to check the Canary before returning the statement recorded by the node content attribute in 4a), go to 4 d).

4d) And judging whether the function returns nodes which are not obtained in the node set or not, if so, turning to 4a) to continue processing the subsequent nodes.

In the step, the custom RISC-V based extended instruction is used to realize the checking operation of the Canary. The customized RISC-V based extended instruction implements a Canary check operation, the customized instruction format comprising:

an Opcode field to represent an instruction Opcode encoding in the custom encoding space;

a Src field indicating a start address of a currently protected code segment;

a Canary field indicating an expected value of Canary;

the Res field indicates the address of the register storing the instruction execution result.

5) Executing the code, and setting a canvas before function jump, wherein the flow is shown in fig. 6, and is specifically described as follows:

5a) before the current control flow is transferred to the stack buffer due to the function call, the Canary set instruction inserted in 3) will be encountered first, executed, and go to 5 b).

5b) Obtain the value of Canary recorded in 2), go to 5 c).

5c) Writing the value of Canary obtained in 5b) to the bottom of the current stack frame, go to 5 d).

5d) And normally executing the subsequent codes of the function.

6) Before the execution flow returns from the function, whether the content of the Canary is changed is checked, and the flow is shown in fig. 7 and specifically described as follows:

6a) before the current control flow factor function returns for removal from the stack buffer, the Canary check instruction inserted in 4) will be encountered first, executed, and go to 6 b).

6b) The value of Canary written in 5c) is taken from the current stack frame, going to 6 c).

6c) Get the correct value of Canary recorded in 2), go to 6 d).

6d) Judging whether the value of the Canary in the 6b) is the same as the correct value of the Canary obtained in the 6c), and if so, turning to the 6 e); if different, go to 6 f).

6e) The program is considered not to have overflowed and the function returns normally.

6f) The program is considered to have overflowed, exceptions are thrown to defend, and the program terminates.

The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.

16页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种面向深度学习模型偏见中毒攻击的防御方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类