Deep structure pointer analysis optimization method for analyzing source library mode defect detector

文档序号:1921191 发布日期:2021-12-03 浏览:20次 中文

阅读说明:本技术 分析源库模式缺陷检测器的深层结构体指针分析优化方法 (Deep structure pointer analysis optimization method for analyzing source library mode defect detector ) 是由 孙永杰 于微 吴倩 王博 任望 王强 于 2021-11-01 设计创作,主要内容包括:本发明公开分析源库模式缺陷检测器的深层结构体指针分析优化方法,包括以下步骤:先收集Source-Sink相关的关联函数,再在函数调用图上分析收集的关联函数,得到函数调用图的子图,接着分析子图中相邻两点,得到分段source-sink可达性判定结果,最后对分段结果进行交/并操作汇总,得到source-sink整体判定条件;本发明综合利用多种静态代码分析技术,使用启发式算法自动将复杂度过高的Source-Sink模式问题合理拆分为若干简单子问题,再对每个子问题进行单独判定,对不可判定问题,能够缩小影响范围,给出关联层次较近的相关函数,方便进行人工校验,可以有效降低代码分析结果的误漏报率。(The invention discloses a deep structure pointer analysis optimization method for analyzing a source library mode defect detector, which comprises the following steps of: collecting Source-Sink related correlation functions, analyzing the collected correlation functions on the function call graph to obtain a subgraph of the function call graph, analyzing two adjacent points in the subgraph to obtain a segmented Source-Sink reachability judgment result, and finally performing intersection/union operation summary on the segmentation result to obtain a Source-Sink overall judgment condition; the method comprehensively utilizes various static code analysis technologies, uses a heuristic algorithm to automatically and reasonably split the Source-Sink mode problem with overhigh complexity into a plurality of simple subproblems, and then carries out independent judgment on each subproblem, thereby being capable of reducing the influence range and providing a correlation function with a relatively close correlation level for the problem which cannot be judged, being convenient for manual verification and effectively reducing the false and missed report rate of the code analysis result.)

1. The deep structure pointer analysis optimization method of the analysis source library mode defect detector is characterized by comprising the following steps of: the method comprises the following steps:

step one

Firstly, establishing a Source abstract and a Sink abstract according to Source-Sink judgment, searching for a correlation function through direct function call analysis, function pointer analysis and similar deep structure analysis according to the Source abstract and the Sink abstract respectively, then summarizing the correlation functions collected in each mode, and carrying out subsequent analysis;

step two

Analyzing the collected correlation functions on the function call graph to obtain a subgraph of the function call graph comprising source nodes, sink points and all correlation functions;

step three

Analyzing two adjacent points in the subgraph by adopting a Source-Sink technology to obtain a sectional Source-Sink reachability judgment result;

step four

And performing intersection operation and parallel operation summarization on the segmentation results to finally obtain source-sink integral judgment conditions, and realizing deep structure pointer analysis optimization.

2. The method of claim 1, wherein the deep structure pointer analysis of the analysis source library pattern defect detector comprises: in the first step, the source abstract and the sink abstract have the same content, the source abstract part directly obtains the function name of the source and the related core variable name, and then collects the function name, the alias of the core variable name and the position of the function name and the alias of the core variable name by combining binding analysis and alias analysis.

3. The method of claim 1, wherein the deep structure pointer analysis of the analysis source library pattern defect detector comprises: in the first step, the direct function call analysis specifically includes: and directly collecting the calling relation and the called relation of the source node and the sink node through a function call graph, wherein the confidence coefficient of the result collected by the direct function call analysis method is gradually reduced along with the depth of the calling function.

4. The method of claim 1, wherein the deep structure pointer analysis of the analysis source library pattern defect detector comprises: in the first step, the function pointer analysis specifically includes: and collecting related function pointers and calling positions thereof according to the binding analysis and alias analysis results of the function names of the source function and the sink function, supplementing the function pointers and the calling positions into direct function call analysis, if the function pointers relate to deep structures, carrying out finite expansion by adopting similar deep structure analysis, and considering the confidence coefficient of the function pointer analysis method analysis, besides the calling function depth, and additionally considering the alias analysis precision.

5. The method of claim 1, wherein the deep structure pointer analysis of the analysis source library pattern defect detector comprises: in the first step, the analysis of the similar deep structure specifically comprises: performing alias analysis on core variables related to the source function and the sink function, if the deep structure is related to, establishing an abstract for the deep structure, and collecting code segments possibly related to the same-name deep structure from deep to shallow through an abstract syntax tree.

6. The method of claim 5, wherein the deep structure pointer analysis of the analysis source library pattern defect detector comprises: the occurrence times of the homonymous deep structure bodies and the context correlation between the target code segments and the source-sink nodes need to be considered according to the results collected from depth to depth through the abstract syntax tree.

7. The method of claim 1, wherein the deep structure pointer analysis of the analysis source library pattern defect detector comprises: in the third step, if the Source-Sink of two adjacent points is judged to be an indeterminable problem with high complexity, the two points are marked independently, and the two points are sorted according to the confidence scores of the two adjacent points for subsequent manual audit.

Technical Field

The invention relates to the technical field of software testing, in particular to a deep structure pointer analysis optimization method for an analysis source library mode defect detector.

Background

Static code analysis refers to a code analysis technology which scans program codes through the technologies of lexical analysis, syntactic analysis, control flow, data flow analysis and the like and verifies whether the codes meet the indexes of normalization, safety, reliability, maintainability and the like in a mode of not running the codes, the static analysis technology develops to the technology of simulation execution so as to discover more defects which can be discovered only through dynamic tests in the traditional sense, such as symbolic execution, abstract interpretation, value dependence analysis and the like, and a mathematical constraint solving tool is adopted to perform path reduction or reachability analysis so as to reduce false alarms and increase efficiency.

The Source-Sink detector generally tracks data transfer of variables on models such as symbolic execution, value dependence analysis and the like by defining a Source function, a Sink function and a filtering function protocol, analyzes guard values and judges accessibility, and is mostly related to the security field, and is a key problem concerned in the field of static code analysis.

The traditional pointer analysis technology is mainly optimized on the basis of a steesgaard algorithm and an Anderson algorithm, precision and efficiency bottlenecks in the aspects of path sensitivity, flow sensitivity, context sensitivity, domain sensitivity and the like need to be faced, due to the efficiency limitation of a static analysis technology, an existing commercial static analysis tool needs to balance the efficiency and the precision, a deep pointer is only analyzed in a limited level and depth, and most commercial static analysis tools cannot analyze the transmission condition of a structural body pointer nested by more than two levels through experiments.

Through investigation, the existing commercial static analysis tool analyzes the Source-Sink type detector and analyzes at most two layers of deep pointers, branches which cannot be analyzed are directly abandoned, the analysis results related to global variables and function pointers are unsatisfactory, and if the analysis results are not considered to be influenced by the branches, a large amount of false reports are generated, and a large amount of manpower is sacrificed for secondary verification.

Disclosure of Invention

Aiming at the problems, the invention aims to provide a deep structure body pointer analysis optimization method for analyzing a Source library mode defect detector, which comprehensively utilizes various static code analysis technologies, automatically and reasonably divides a Source-Sink mode problem with overhigh complexity into a plurality of simple sub-problems by using a heuristic algorithm, and then performs independent judgment on each sub-problem, can reduce the influence range of the problem which cannot be judged, provides a related function with a relatively close related level, is convenient for manual verification, and can effectively reduce the false and missing report rate of a code analysis result, thereby reducing the manual audit cost.

In order to achieve the purpose of the invention, the invention is realized by the following technical scheme: the deep structure pointer analysis and optimization method for analyzing the source library mode defect detector comprises the following steps of:

step one

Firstly, establishing a Source abstract and a Sink abstract according to Source-Sink judgment, searching for a correlation function through direct function call analysis, function pointer analysis and similar deep structure analysis according to the Source abstract and the Sink abstract respectively, then summarizing the correlation functions collected in each mode, and carrying out subsequent analysis;

step two

Analyzing the collected correlation functions on the function call graph to obtain a subgraph of the function call graph comprising source nodes, sink points and all correlation functions;

step three

Analyzing two adjacent points in the subgraph by adopting a Source-Sink technology to obtain a sectional Source-Sink reachability judgment result;

step four

And performing intersection operation and parallel operation summarization on the segmentation results to finally obtain source-sink integral judgment conditions, and realizing deep structure pointer analysis optimization.

The further improvement lies in that: in the first step, the source abstract and the sink abstract have the same content, the source abstract part directly obtains the function name of the source and the related core variable name, and then collects the function name, the alias of the core variable name and the position of the function name and the alias of the core variable name by combining binding analysis and alias analysis.

The further improvement lies in that: in the first step, the direct function call analysis specifically includes: and directly collecting the calling relation and the called relation of the source node and the sink node through a function call graph, wherein the confidence coefficient of the result collected by the direct function call analysis method is gradually reduced along with the depth of the calling function.

The further improvement lies in that: in the first step, the function pointer analysis specifically includes: and collecting related function pointers and calling positions thereof according to the binding analysis and alias analysis results of the function names of the source function and the sink function, supplementing the function pointers and the calling positions into direct function call analysis, if the function pointers relate to deep structures, carrying out finite expansion by adopting similar deep structure analysis, and considering the confidence coefficient of the function pointer analysis method analysis, besides the calling function depth, and additionally considering the alias analysis precision.

The further improvement lies in that: in the first step, the analysis of the similar deep structure specifically comprises: performing alias analysis on core variables related to the source function and the sink function, if the deep structure is related to, establishing an abstract for the deep structure, and collecting code segments possibly related to the same-name deep structure from deep to shallow through an abstract syntax tree.

The further improvement lies in that: the occurrence times of the homonymous deep structure bodies and the context correlation between the target code segments and the source-sink nodes need to be considered according to the results collected from depth to depth through the abstract syntax tree.

The further improvement lies in that: in the third step, if the Source-Sink of two adjacent points is judged to be an indeterminable problem with high complexity, the two points are marked independently, and the two points are sorted according to the confidence scores of the two adjacent points for subsequent manual audit.

The invention has the beneficial effects that: the invention comprehensively utilizes various static code analysis technologies, uses a heuristic algorithm to reasonably divide the Source-Sink mode problem with overhigh complexity into a plurality of simple subproblems, then carries out independent judgment on each subproblem, can narrow the influence range for the problem which can not be judged, provides related functions with closer related levels, is convenient for manual verification, and can effectively reduce the false missing report rate of the code analysis result, thereby reducing the manual audit cost. The method can be effectively supplemented with the source-sink analysis logic of the traditional static analysis tool.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of a method according to a first embodiment of the present invention;

FIG. 2 is a flowchart of obtaining a correlation function according to a first embodiment of the present invention;

FIG. 3 is a diagram of a sub-diagram of a correlation function according to a first embodiment of the present invention;

fig. 4 is a diagram illustrating an example of determining a correlation function subgraph according to a first embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," "fourth," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Example one

Referring to fig. 1, 2, 3, and 4, the present embodiment provides a deep structure pointer analysis optimization method for analyzing a source library pattern defect detector, including the following steps:

step one

Firstly, determining and constructing a Source abstract and a Sink abstract according to Source-Sink judgment, wherein the Source abstract and the Sink abstract have the same content, taking the Source abstract as an example, a Source abstract part directly acquires a function name of a Source and a related core variable name, collects the function name, the alias of the core variable name and the position of the alias by combining binding analysis and alias analysis, then searches for an associated function through direct function call analysis, function pointer analysis and similar deep structure analysis according to the Source abstract and the Sink abstract respectively, the confidence coefficient of the associated function collected in each mode is different, then collects the associated functions collected in each mode, and carries out subsequent analysis;

wherein, direct function call analysis: directly collecting the calling relation and the called relation of the source node and the sink node through a function call graph, wherein the confidence of the result collected by the direct function call analysis method is gradually reduced along with the depth of the calling function;

function pointer analysis: collecting related function pointers and calling positions thereof according to binding analysis and alias analysis results of function names of source functions and sink functions, supplementing the function pointers and the calling positions into direct function call analysis, if the function pointers relate to deep structures, carrying out finite expansion by adopting similar deep structure analysis, and considering the confidence of the partial analysis, besides the depth of the calling function, and additionally considering the precision of alias analysis;

analysis of similar deep structures: performing alias analysis on core variables related to the source function and the sink function, if the core variables relate to a deep structure, establishing an abstract for the deep structure, collecting code segments possibly related to the same-name deep structure from deep to shallow through an abstract syntax tree, and considering the occurrence times of the same-name deep structure and the context correlation between a target code segment and a source-sink node according to the result collected from deep to shallow through the abstract syntax tree;

the confidence coefficient is related to the depth of the structural body level and the occurrence frequency of the structural body level in the code, in the actual test, if the style of the tested code is fully tested by a static code analysis tool and the style of the code is better, the disposal confidence coefficient is extremely high, and a large amount of complex code logic relations which are difficult to directly analyze in the traditional static code analysis can be introduced;

step two

Analyzing the collected correlation functions on the function call graph to obtain a subgraph of the function call graph comprising source nodes, sink points and all correlation functions;

step three

Analyzing any two adjacent points m and n in the subgraph by adopting a Source-Sink technology to obtain a plurality of segmented Source-Sink reachability judgment results, wherein the reachability judgment can be realized by reusing the original Source-Sink analysis and a filter function, and can also be independently written, if the Source-Sink of the two adjacent points is judged to be an indeterminable problem with too high complexity, the two points are marked independently, and the two points are sorted according to the confidence scores of the two adjacent points for subsequent manual audit;

step four

And performing intersection operation and parallel operation summarization on the segmentation results to finally obtain source-sink integral judgment conditions, and realizing deep structure pointer analysis optimization.

The analysis results for fig. 3 are shown in fig. 4, where the final source to sink condition is (f (source, n1) & & f (m1, m2)) | (f (source, n1) & & f (m1, m 2)).

Example two

Analyzing the kernel code of the linux-5.11.1 version according to the description of CVE-2021-:

1. finding paired source-sink definitions in linux-5.11.1\ kernel \ user _ driver.c [ umd _ info- > tgid = get _ pid (task _ tgid (current)) ], and [ put _ pid (umd _ info- > tgid) ];

2. the Source-Sink of umd _ info is correlated, and the correlation is found to involve a global variable, a deep pointer and a multi-level function pointer at the same time, so that direct data stream correlation is difficult to perform;

3. analysis of Source function umd _ setup:

3.1. directly analyzing the called point of the Source through the function call graph, and finding the called point to be empty;

3.2. analyzing a statement transmitted by a Source through a function pointer through an abstract syntax tree, and finding one or only one associated function fork _ user _ driver;

3.2.1. analyzing a call point call _ usermodehelper _ setup, transmitting a Source function pointer through a parameter init, uniquely assigning a value to the sub _ info- > init = init, and establishing a structure summary [ subwoss _ info ] - > init for the function pointer;

3.2.2. searching for the calling of the [ sub _ info ] - > init field through an abstract syntax tree hierarchy to obtain that the [ sub _ info ] - > init field has one and only one associated function, namely, call _ usermodehelper _ exec _ async;

4. the Sink function umd _ clear is analyzed:

4.1. directly analyzing the called point of the Sink through a function call graph to find that the called point is empty;

4.2. analyzing statements transmitted by Sink through a function pointer through an abstract syntax tree, and finding one or only one associated function fork _ user _ driver;

4.2.1. analyzing a call point call _ usermodehelper _ setup, introducing a Source function pointer through a parameter init, uniquely assigning a value to sub _ info- > clear = clear, and establishing a structure summary [ subwoss _ info ] - > clear for the function pointer;

4.2.2. searching for a call of a [ subwoss _ info ] - > clearword field through an abstract syntax tree hierarchy, obtaining that the [ subwoss _ info ] - > clearword field has one and only one associated function call _ usermatcheper _ exec _ async, and analyzing the associated function call _ usermatter _ setup () - - > call _ usermatchereheper _ exec _ work () - - - - - - > call _ usermatchephe _ exec _ async () - - - > [ subwoss _ info ] - > clearword ();

5. through Source-Sink structure abstract analysis:

5.1. through intra-function analysis, the upper structure umd _ info of tgid is only from the data field of the subwoss _ info type info, and a structure summary [ subwoss _ info ] - > data- > tgid is established;

5.2. through analysis in a function, the upper structure umd _ info of tgid is only from the data field of the subwoss _ info type info, a structure summary [ subwoss _ info ] - > data- > tgid is established, and the structure summary is the same as the 5.1 structure summary, so that strong confidence is judged to exist;

5.3. respectively searching for [ subwprocesses _ info ] - > data- > tgid through an abstract syntax tree hierarchy; the [ sub _ info ] - > data field is called by assignment/assignment, so that the [ sub _ info ] - > data- > tgid has no direct match, and the [ sub _ info ] - > data field only has one association function call _ user _ modulator _ setup;

6. and synthesizing the correlation functions to obtain various possible call chains of the Source-Sink, and performing segmentation analysis on the Source-Sink under the premise of the possible call chains. The final analysis is needed:

reachability f1 of info- > work to call _ usermodehelper _ freeinfo within call _ usermodehelper _ exec function;

6.2. call chain reachability f2 of info- > work to sourceNode;

reachability f3 from call _ usermodehelper _ freeinfo to sinkNode.

And finally, f1& & f2& & f3 is calculated to be the judgment of whether the memory leak exists.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

11页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:数据仓库生产环境和开发环境分离实现方法和装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!