Detecting secondary security vulnerabilities by modeling information flow in persistent storage

文档序号：1966945 发布日期：2021-12-14 浏览：13次中文

阅读说明：本技术 通过对持久存储装置中的信息流建模来检测二级安全漏洞 (Detecting secondary security vulnerabilities by modeling information flow in persistent storage ) 是由 P·克里什南陆熠 R·R·卡加拉瓦迪于 2020-01-29 设计创作，主要内容包括：一种方法可以包括确定代码中的源变量从由目标分析指定的源函数接收源值、确定代码中的源语句使用源变量将源值写入到表中的列、针对代码中的接收器语句,获得受源变量影响的受影响变量集合、确定接收器语句将源值读取到包括所述列的标识符的接收器变量中、通过将接收器变量添加到受影响变量集合来生成修改后的受影响变量集合,以及报告接收器语句处的缺陷。(A method may include determining that a source variable in code receives a source value from a source function specified by a target analysis, determining that a source sentence in code uses the source variable to write the source value to a column in a table, for a receiver sentence in the code, obtaining a set of affected variables affected by the source variable, determining that the receiver sentence reads the source value into a receiver variable that includes an identifier of the column, generating a modified set of affected variables by adding the receiver variable to the set of affected variables, and reporting a defect at the receiver sentence.)

1. A method, comprising:

determining that a first source variable in the code receives a first source value from a first source function specified by the target analysis;

determining that a first source statement in the code uses a first source variable to write a first source value to a column in a table;

for a first receiver statement in the code, obtaining a first set of affected variables affected by a first source variable;

determining that a first receiver statement read a first source value into a first receiver variable comprising an identifier of the column;

generating a modified first set of affected variables by adding the first receiver variable to the first set of affected variables; and

a first defect at the first receiver statement is reported.

2. The method of claim 1, further comprising:

for a second receiver statement in the code, obtaining a second receiver variable affected by the first receiver variable, wherein the second receiver variable reads the first source value;

adding a second receiver variable to the modified first set of affected variables; and

reporting a second defect at a second receiver statement.

3. The method of claim 1, wherein the first set of affected variables is further affected by a set of source variables, wherein the set of source variables includes the first source variable, the method further comprising:

for a second receiver statement in the code, obtaining a second set of affected variables affected by the first set of affected variables;

adding a plurality of nodes to the trail graph, including:

a plurality of source nodes corresponding to the set of source variables,

a first plurality of affected variable nodes corresponding to the first set of affected variables, an

A second plurality of affected variable nodes corresponding to a second set of affected variables,

wherein each node of the plurality of nodes comprises a location in the code;

adding a first plurality of edges to the trail graph, each edge connecting one of the plurality of source nodes and one of a first plurality of affected variable nodes;

adding a second plurality of edges to the trail graph, each edge connecting one of the first plurality of affected variable nodes and one of the second plurality of affected variable nodes; and

a defect track is reported that includes one of the first plurality of edges and one of the second plurality of edges.

4. The method of claim 1, wherein the first receiver variable further comprises an identifier of a row in the table, the method further comprising:

for a first receiver statement, obtaining an abstract state that assigns an abstraction value to each affected variable in a first set of affected variables; and

the abstract state is modified using a first receiver statement.

5. The method of claim 1, wherein the first and second light sources are selected from the group consisting of a red light source, a green light source, and a blue light source,

wherein the code further comprises:

(i) a first component comprising a first receiver statement, an

(ii) A second component comprising a second receiver statement,

wherein the second component has a size below a predetermined threshold, an

Wherein the method further comprises:

determining that a second source variable in the code receives a second source value from a second source function specified by a target analysis;

determining that a second source sentence in the code uses a second source variable to write a second source value to a cell in the column;

for a second receiver statement, obtaining:

a second set of affected variables affected by a second source variable, an

Assigning an abstract state of abstraction values to each affected variable in the second set of affected variables,

wherein the second receiver statement reads the second source value into a second receiver variable comprising an identifier of the cell;

adding a second receiver variable to the second set of affected variables;

modifying the abstract state using a second receiver statement; and

reporting a second defect at a second receiver statement.

6. The method of claim 1, further comprising:

for a first statement in the code, obtaining an abstract state and a set of variable dependencies, each variable dependency comprising a pair of variables, wherein at least one of the set of variable dependencies comprises a variable corresponding to a cell in the column, and wherein the abstract state assigns an abstract value to each variable in each variable dependency in the set of variable dependencies; and

modifying the set of variable dependencies and the abstract state using a first statement.

7. The method of claim 1, further comprising:

for a second receiver statement in the code, obtaining a second receiver variable affected by the first receiver variable, wherein the second receiver variable reads the first source value;

adding a second receiver variable to the modified first set of affected variables;

determining that the first source value was modified by a modifier function before the second receiver variable accessed the first source value; and

in response to determining that the first source value was modified by the modifier function before the second receiver variable accessed the first source value, reporting prevents a flaw at the second receiver statement.

8. A system, comprising:

a memory coupled to the computer processor;

a repository configured to store a table and code comprising a first source statement and a first receiver statement; and

a code analyzer executing on the computer processor and using the memory configured to:

determining that a first source variable in the code receives a first source value from a first source function specified by a target analysis,

determining that a first source statement writes a first source value to a column in the table using a first source variable, for a first receiver statement, obtaining a first set of affected variables affected by the first source variable,

determining that a first receiver statement read a first source value into a first receiver variable comprising an identifier of the column,

generating a modified first set of affected variables by adding the first receiver variable to the first set of affected variables, an

A first defect at the first receiver statement is reported.

9. The system of claim 8, wherein the code analyzer is further configured to:

for a second receiver statement in the code, obtaining a second receiver variable affected by the first receiver variable, wherein the second receiver variable reads the first source value;

adding a second receiver variable to the modified first set of affected variables; and

reporting a second defect at a second receiver statement.

10. The system of claim 8, wherein the first set of affected variables is further affected by a set of source variables, wherein the set of source variables includes the first source variable, and wherein the code analyzer is further configured to:

for a second receiver statement in the code, obtaining a second set of affected variables affected by the first set of affected variables,

adding a plurality of nodes to the trail graph, including:

a plurality of source nodes corresponding to the set of source variables,

a first plurality of affected variable nodes corresponding to the first set of affected variables, an

A second plurality of affected variable nodes corresponding to a second set of affected variables,

wherein each node of the plurality of nodes comprises a location in the code;

adding a first plurality of edges to the trail graph, each edge connecting one of the plurality of source nodes and one of a first plurality of affected variable nodes,

adding a second plurality of edges to the trail graph, each edge connecting one of the first plurality of affected variable nodes and one of the second plurality of affected variable nodes, an

A defect track is reported that includes one of the first plurality of edges and one of the second plurality of edges.

11. The system of claim 8, wherein the first receiver variable further comprises an identifier of a row in the table, and wherein the code analyzer is further configured to:

for a first receiver statement, obtaining an abstract state that assigns an abstraction value to each affected variable in a first set of affected variables; and

the abstract state is modified using a first receiver statement.

12. The system of claim 8, wherein the first and second sensors are arranged in a single unit,

wherein the code further comprises:

(i) a first component comprising a first receiver statement, an

(ii) A second component comprising a second receiver statement,

wherein the second component has a size below a predetermined threshold, an

Wherein the code analyzer is further configured to:

determining that a second source variable in the code receives a second source value from a second source function specified by the target analysis,

determining that a second source sentence in the code uses a second source variable to write a second source value to a cell in the column,

for a second receiver statement, obtaining:

a second set of affected variables affected by a second source variable, an

Assigning an abstract state of abstraction values to each affected variable in the second set of affected variables,

wherein the second receiver statement reads the second source value into a second receiver variable comprising an identifier of the cell,

adding a second receiver variable to the second set of affected variables,

modifying the abstract state using a second receiver statement, an

Reporting a second defect at a second receiver statement.

13. The system of claim 8, wherein the code analyzer is further configured to:

Modifying the set of variable dependencies and the abstract state using a first statement.

14. The system of claim 8, wherein the code analyzer is further configured to:

obtaining, for a second receiver statement in the code, a second receiver variable affected by the first receiver variable, wherein the second receiver variable reads the first source value,

adding a second receiver variable to the modified first set of affected variables,

determining that the first source value was modified by a modifier function before the second receiver variable accessed the first source value, an

In response to determining that the first source value was modified by the modifier function before the second receiver variable accessed the first source value, reporting prevents a flaw at the second receiver statement.

15. A non-transitory computer-readable medium comprising instructions that, when executed by a computer processor, perform:

determining that a first source variable in the code receives a first source value from a first source function specified by the target analysis;

determining that a first source statement in the code uses a first source variable to write a first source value to a column in a table;

for a first receiver statement in the code, obtaining a first set of affected variables affected by a first source variable;

determining that a first receiver statement read a first source value into a first receiver variable comprising an identifier of the column;

generating a modified first set of affected variables by adding the first receiver variable to the first set of affected variables; and

a first defect at the first receiver statement is reported.

16. The non-transitory computer readable medium of claim 15, further comprising instructions to:

for a second receiver statement in the code, obtaining a second receiver variable affected by the first receiver variable, wherein the second receiver variable reads the first source value;

adding a second receiver variable to the modified first set of affected variables; and

reporting a second defect at a second receiver statement.

17. The non-transitory computer-readable medium of claim 15, wherein the first set of affected variables is further affected by a set of source variables, wherein the set of source variables includes the first source variable, and wherein the instructions further perform:

for a second receiver statement in the code, obtaining a second set of affected variables affected by the first set of affected variables;

adding a plurality of nodes to the trail graph, including:

a plurality of source nodes corresponding to the set of source variables,

a first plurality of affected variable nodes corresponding to the first set of affected variables, an

A second plurality of affected variable nodes corresponding to a second set of affected variables,

wherein each node of the plurality of nodes comprises a location in the code;

adding a first plurality of edges to the trail graph, each edge connecting one of the plurality of source nodes and one of a first plurality of affected variable nodes;

adding a second plurality of edges to the trail graph, each edge connecting one of the first plurality of affected variable nodes and one of the second plurality of affected variable nodes; and

a defect track is reported that includes one of the first plurality of edges and one of the second plurality of edges.

18. The non-transitory computer-readable medium of claim 15, wherein the first receiver variable further comprises an identifier of a row in the table, and wherein the instructions further perform:

for a first receiver statement, obtaining an abstract state that assigns an abstraction value to each affected variable in a first set of affected variables; and

the abstract state is modified using a first receiver statement.

19. The non-transitory computer readable medium of claim 15,

wherein the code further comprises:

(i) a first component comprising a first receiver statement, an

(ii) A second component comprising a second receiver statement,

wherein the second component has a size below a predetermined threshold, an

Wherein the instructions further perform:

determining that a second source variable in the code receives a second source value from a second source function specified by a target analysis;

determining that a second source sentence in the code uses a second source variable to write a second source value to a cell in the column;

for a second receiver statement, obtaining:

a second set of affected variables affected by a second source variable, an

Assigning an abstract state of abstraction value to each affected variable in the second set of affected variables, an

Wherein the second receiver statement reads the second source value into a second receiver variable comprising an identifier of the cell;

adding a second receiver variable to the second set of affected variables;

modifying the abstract state using a second receiver statement; and

reporting a second defect at a second receiver statement.

20. The non-transitory computer readable medium of claim 15, further comprising instructions to:

modifying the set of variable dependencies and the abstract state using a first statement.

Background

Applications that use database query language (e.g., structured query language or SQL) statements may become vulnerable when unpurified users enter the query language statements. First-order query language injection may occur when malicious users inject query language statements to extract sensitive data, tamper with existing data, or cause denial of service. Secondary-order query language injection may occur when a malicious user stores a load into a database and manipulates an application to read the load from the database via a query language statement. Since the data in the database is generally considered secure, these secondary vulnerabilities may not be detectable using the primary query language injection detection mechanism.

Disclosure of Invention

This summary is provided to introduce a selection of concepts that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in limiting the scope of the claimed subject matter.

In general, in one aspect, one or more embodiments are directed to a method comprising: determining a source variable in the code to receive a source value from a source function specified by the target analysis, determining a source sentence in the code to write the source value to a column in a table using the source variable, obtaining a set of affected variables affected by the source variable for a receiver (sink) statement in the code, determining a receiver statement to read the source value into a receiver variable including an identifier of the column, generating a modified set of affected variables by adding the receiver variable to the set of affected variables, and reporting a flaw at the receiver statement.

In general, in one aspect, one or more embodiments relate to a system comprising a memory coupled to a computer processor, a repository configured to store tables and code comprising source statements and receiver statements, and a code analyzer executing on the computer processor and using the memory, the code analyzer configured to: determining a source variable in the code to receive a source value from a source function specified by the target analysis, determining a source sentence to write the source value to a column in the table using the source variable, obtaining a set of affected variables affected by the source variable for the receiver sentence, determining a receiver sentence to read the source value into a receiver variable including an identifier of the column, generating a modified set of affected variables by adding the receiver variable to the set of affected variables, and reporting a flaw at the receiver sentence.

In general, in one aspect, one or more embodiments are directed to a non-transitory computer-readable medium comprising instructions that, when executed by a computer processor, perform: the method includes determining that a source variable in the code receives a source value from a source function specified by a target analysis, determining that a source sentence in the code uses the source variable to write the source value to a column in a table, obtaining a set of affected variables affected by the source variable for a receiver sentence in the code, determining that the receiver sentence reads the source value into a receiver variable that includes an identifier of the column, generating a modified set of affected variables by adding the receiver variable to the set of affected variables, and reporting a flaw at the receiver sentence.

Other aspects of the invention will be apparent from the following description and appended claims.

Drawings

Fig. 1A and 1B illustrate a system according to one or more embodiments of the invention.

Fig. 2, 3A, 3B, 3C, and 3D illustrate flow diagrams in accordance with one or more embodiments of the present invention.

Fig. 4A, 4B, and 4C illustrate examples according to one or more embodiments of the invention.

Fig. 5A and 5B illustrate a computing system in accordance with one or more embodiments of the invention.

Detailed Description

Specific embodiments of the present invention will now be described in detail with reference to the accompanying drawings. Like elements in the various figures are denoted by like reference numerals for consistency.

In the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.

Throughout this application, ordinal numbers (e.g., first, second, third, etc.) may be used as adjectives for elements (i.e., any noun in the application). The use of ordinal numbers does not imply or create any particular ordering of elements nor limit any elements to only a single element unless explicitly disclosed, such as by the use of the terms "before", "after", "single", and other such terms. Rather, ordinals are used to distinguish between elements. As an example, a first element is different from a second element, and the first element may encompass more than one element and be subsequent (or prior) to the second element in the ordering of the elements.

Embodiments of the present invention generally relate to detecting a secondary security breach in code. In one or more embodiments, a data stream from a source variable to a sink variable is tracked, where the stream includes writes and reads to persistent storage (e.g., tables in a database). For example, the code may be embedded in SQL statements. The flow of interest may be determined relative to a target analysis (e.g., a contamination or escape analysis). The efficiency and accuracy of the analysis can be adjusted based on two factors: 1) flow particle size: whether each flow represents a dependency between two variables or between two sets of variables, and 2) whether the value of each cell is represented, or whether the cell value is abstracted to a value of a column.

FIG. 1A shows a computer system (100) in accordance with one or more embodiments of the invention. As shown in fig. 1A, a computer system (100) includes a repository (102), a code analyzer (104), and one or more computer processors (106). In one or more embodiments, the computer system (100) takes the form of the computing system (500) described with respect to fig. 5A and the accompanying description below, or the form of the client device (526) described with respect to fig. 5B. In one or more embodiments, the computer processor (106) takes the form of the computer processor (502) described with respect to fig. 5A and accompanying description below.

In one or more embodiments, the repository (102) may be any type of storage unit and/or device for storing data (e.g., a file system, a database, a collection of tables, or any other storage mechanism). Additionally, repository (102) may include a plurality of different storage units and/or devices. The plurality of different storage units and/or devices may or may not be of the same type or located at the same physical site.

In one or more embodiments, the repository (102) includes code (110), one or more tables (120), an abstract state repository (130), a target analysis (134), and a trajectory graph (136). In one or more embodiments, code (110) includes components (112A, 112N). The component (112A) may be a source code unit. The programming entities defined within the component (112A) may be imported by other components. For example, the programmatic entity may be a file, package, class, function, etc. The component (112A) may include statements (114) written in a programming language or intermediate representation (e.g., bytecode). For example, the statement (114) may be written in a programming language that embeds query language (e.g., structured query language or SQL) statements. Each statement (114) may correspond to a location (e.g., a program point) in the code (110). For example, the location may specify a row number in the component (112A).

In one or more embodiments, the table (120) includes columns (122A, 122N). The table (120) may be stored in a database. Each column (122A) may include one or more cells (124), each cell including a value. Each column (122A) may have a name, a type, a privilege, and/or various other attributes. For example, the people table may include a username column, where each cell in the username column is assigned a particular value (e.g., "Bob"). Each cell (124) may correspond to a row of the table (120). For example, the cell in the username column to which the value "Bob" is assigned may correspond to a row that assigns a value (e.g., Bob's password, Bob's permissions, etc.) to a column (122A, 122N) of the table (120).

Turning to FIG. 1B, in one or more embodiments, the statement (150) includes variables (152A, 152N). Each variable (152A) may be a table variable (154) or an application variable (156). In one or more embodiments, a table variable (154) accesses the contents of one or more tables. The table variable (154) may include a column identifier (158) for a column (122A) in the table (120). The column identifier (158) may be considered a variable because the column identifier (158) may serve as a placeholder for values of cells of the column (122A), similar to variables that serve as placeholders for possible values of variables.

Alternatively, the table variable (154) may be a cell identifier (160) corresponding to one of the cells (124) in the table (120). In one or more embodiments, the cell identifier (160) includes a column identifier (158) and a row identifier. The row identifier may correspond to a row in the table (120).

In one or more embodiments, the application variable (156) may reference a location of a value stored in the code (110), such as a distribution site. The distribution site may be a statement in the code (110) that declares, instantiates, and/or initializes the object. Application variables (156) may refer to simple distribution sites (e.g., numbers or string values), may refer to complex distribution sites (e.g., underlying objects or structures containing one or more fields), or may refer to fields in complex distribution sites. The dispensing station may contain different values at different points in time. In one or more embodiments, an allocation site may refer to a location in memory (e.g., heap memory) of the computer system (100) that is allocated when a function in the code (110) is executed.

Returning to FIG. 1A, in one or more embodiments, the abstract state repository (130) assigns abstract states (132A, 132N) to statements (114A, 114N). Returning to FIG. 1B, in one or more embodiments, the abstract state (170) assigns abstract values (172A, 172N) to the variables (152A, 152N). In one or more embodiments, each abstract value (172A, 172N) corresponds to a particular set of values. The abstract value (172A) may correspond to a specific set of values that may be assigned to the variable (152A) during execution of the code (110). Thus, the abstract value (172A) may define constraints on the possible specific values that may be assigned to the variable (152A) during execution of the code (110). In one or more embodiments, the abstract value (172A) may be represented by a regular expression. For example, a regular expression may represent possible string values of a variable (152A) as determined by a string constraint solver. Examples of abstract values (172A, 172N) for integers include: any integer, any positive integer, any even number, any odd number, any non-zero integer, a particular set of integers, etc. Examples of abstract values (172A, 172N) for a string include: any string, any non-empty string, a particular string, a set of particular strings, a numeric string, a non-numeric string, etc.

Returning to FIG. 1A, in one or more embodiments, the target analysis (134) is an analysis to be performed on the code (110). In one or more embodiments, the target analysis (134) is performed using abstract interpretation. Abstract interpretation is a static analysis technique that over-approximates the behavior of the code (110), thereby enabling the code analyzer (104) to check whether the code (110) would exhibit defective (e.g., malicious) behavior under any possible execution without directly executing the code (110). In one or more embodiments, static analysis analyzes an abstract state (132A, 132N) associated with a statement (114A, 114N).

Returning to FIG. 1B, in one or more embodiments, the target analysis (134) includes a source function (162), a receiver statement (164), and a modifier function (166). In one or more embodiments, the source function (162) may receive a value of interest related to a type of analysis to be performed on the code (110). For example, when the target analysis (134) is a contamination analysis, the source function (162) may receive contaminated values from an external source. Continuing with this example, the tainted value may correspond to a user-provided or externally generated value (e.g., an unknown value that may be controlled by an attacker). The source function (162) may receive the tainted value directly from an external source (e.g., via an Application Program Interface (API)). Alternatively, the source function (162) may receive the tainted value via a tainted stream (e.g., via a series of function calls that transfer the tainted value from an external source). As another example, when the target analysis (134) is an escape analysis, the source function (162) may receive sensitive data (e.g., where the source function (162) may have confidential access privileges).

In one or more embodiments, the receiver statements (164) may utilize source values (e.g., in a manner to represent security flaws) with respect to the target analysis (134). For example, when the analysis of the code (110) is a contamination analysis, the receiver statements (164) may access security-sensitive resources of the computer system (100). Alternatively, a receiver statement may provide a dirty value to another receiver statement that accesses a security-sensitive resource. As another example, when the analysis of the code (110) is an escape analysis, the receiver statement (164) may allow non-privileged (e.g., public) access to sensitive data and may therefore represent a confidential information leak point.

In one or more embodiments, the modifier function (166) may modify the source values to prevent potential security flaws. For example, in a contamination analysis, the modifier function (166) may sanitize the contaminated data to render the contaminated data harmless. Similarly, in escape analysis, the modifier function (166) may decrypt (e.g., edit) sensitive data.

Returning to FIG. 1A, in one or more embodiments, the trackgraph (136) represents the potential for a value (e.g., a source value provided by the source function (162) of FIG. 1B) through a series of variables (152A, 152N) used in a series of statements (114). For example, a path in the trajectory graph (136) may correspond to a defect (e.g., a polluting flow or an escaping flow) in the code (110). Continuing with this example, in the context of security analysis, a path in the trace graph (136) may indicate how the variables (152A, 152N) are tainted or release sensitive data.

In one or more embodiments, the code analyzer (104) is implemented in hardware (e.g., circuitry), software, firmware, and/or any combination thereof. In one or more embodiments, the code analyzer (104) includes functionality to perform static analysis of the code (110) (e.g., using target analysis (134)). The code analyzer (104) may include functionality to report defects in the code (110) using static analysis. The code analyzer (104) may include functionality to perform different types of static analysis on different components (112A, 112N) of the code (110).

In one or more embodiments, the computer processor (106) includes functionality to execute code (110). In one or more embodiments, the computer processor (106) includes functionality to perform the code analyzer (104).

While fig. 1A shows a configuration of components, other configurations may be used without departing from the scope of the present invention. For example, the various components may be combined to create a single component. As another example, functionality performed by a single component may be performed by two or more components.

FIG. 2 shows a flow diagram in accordance with one or more embodiments of the invention. The flow diagram depicts a process for line folding information flow analysis. One or more of the steps in fig. 2 may be performed by a component discussed above with reference to fig. 1A, e.g., code analyzer (104) of computer system (100). In one or more embodiments of the invention, one or more of the steps shown in fig. 2 may be omitted, repeated, and/or performed in parallel, or in a different order than that shown in fig. 2. Accordingly, the scope of the present invention should not be considered limited to the particular arrangement of steps shown in FIG. 2.

Initially, in step 202, it is determined that a source variable receives a source value from a source function specified by a target analysis. For example, when the target analysis is a contamination analysis, the source variable may receive a contaminated value. Alternatively, the source variable may receive the secret value when the target analysis is an escape analysis. In one or more embodiments, a source value is the result of an expression that includes one or more source variables. The expression may be a conditional expression for selecting a row from a table.

In step 204, a source statement in the code is determined to write a source value to a column in the table using the source variable. For example, the source statement may be an SQL insert or update statement. The source value may be written into a cell of a column, where the cell corresponds to a row in the table.

In step 206, one or more affected variable sets that are affected by the source variable are obtained for the receiver statement in the code. In one or more embodiments, the receiver statements are receiver statements specified by the target analysis. For example, when the target analysis is a contamination analysis, the receiver statements may access security-sensitive resources of the computer system. Alternatively, when the target analysis is escape analysis, the receiver statements may allow non-privileged access to the confidential data. In one or more embodiments, the target analysis specifies that the receiver statement is a data manipulation statement that modifies (e.g., inserts, updates, or deletes) data in a column in a table.

In one or more embodiments, the code analyzer tracks the aggregate (e.g., over-approximation) dependency of the set of affected variables on the set of source variables (e.g., rather than accurately tracking the particular source variables that affect the particular receiver variables), which sacrifices some precision in exchange for greater computational efficiency.

In one or more embodiments, the code analyzer obtains the set of affected variables by performing a static analysis (e.g., a target analysis) on the code. In one or more embodiments, the static analysis uses abstract interpretation techniques to assign abstract values to variables used in receiver statements. For example, the code analyzer may use a constraint propagation and/or constraint satisfaction algorithm to calculate abstract values assigned to different variables, where each abstract value constraint may be assigned to a possible specific value of a corresponding variable.

In step 208, the determine receiver statement reads the source value into a receiver variable that includes the identifier of the column. For example, a receiver variable may be a column identifier whose corresponding column has been affected (e.g., contaminated) by the source value written to that column. The source value may be written to a cell of the column identified by the column identifier, where the cell corresponds to a row in the table.

In step 210, a modified set of affected variables affected by the source variable is generated by adding the receiver variable to the set of affected variables. Continuing with the example above, after processing the following receiver statement, if variable v is already in the affected variable set, the code analyzer may add the column identifier "credentials. INSERT INTO cultures (username) VALUES (v). Continuing with this example, if variable v is contaminated, then the column identifier "creatians.

In one or more embodiments, the code analyzer may modify the set of affected variables affected by the source variable by removing the receiver variable from the set of affected variables. Continuing with the example above, the delete statement may delete the source value from the column, so that the effect of the column on the set of affected variables may be eliminated.

In step 212, a defect at the receiver statement is reported. In one or more embodiments, the flaw at the receiver statement is due to the effect of the source variable on the receiver variable. For example, the receiver variable may provide the contaminated value received from the source variable to the security sensitive function. Alternatively, the receiver variable may provide the secret value from the source variable to a function that allows non-privileged access.

The code analyzer may report the flaw based on the effect of the source variable on the receiver variable, regardless of the particular value of the source variable. For example, if the source variable writes a dirty value to any cell in a column, then the entire column may be considered dirty.

In one or more embodiments, defects are prevented when a source value received from a source variable is modified before being received by a receiver variable. In one or more embodiments, the code analyzer reports that the defect has been prevented due to the action of the modifier. For example, when the target analysis is a contamination analysis, the source values may be modified by the purifier before the receiver variables are received. Alternatively, when the target analysis is an escape analysis, the source values may be modified by the decryptor prior to reception of the receiver variable.

The line folding information flow analysis described in FIG. 2 is efficient and therefore scalable to large code libraries for the following reasons: 1) the analysis focuses on a particular information flow based on source variables that receive source values from source functions specified by the target analysis; and 2) analyzing the over-approximation of the source variable's impact on the set of affected variables (e.g., rather than accurately identifying each particular variable that is directly affected by the source variable), which sacrifices some precision in exchange for greater computational efficiency. In contrast, the dependency analysis described below in FIG. 3C tracks accurate dependency information between variables, thereby achieving higher accuracy, but at the cost of greater computational overhead and reduced scalability.

FIG. 3A shows a flow diagram in accordance with one or more embodiments of the invention. The flow diagram depicts a process for line folding information flow analysis. One or more of the steps in fig. 3A may be performed by a component discussed above with reference to fig. 1A, such as code analyzer (104) of computer system (100). In one or more embodiments of the invention, one or more of the steps shown in fig. 3A may be omitted, repeated, and/or performed in parallel, or in a different order than that shown in fig. 3A. Accordingly, the scope of the present invention should not be considered limited to the particular arrangement of steps shown in FIG. 3A.

Initially, in step 300, a statement in code is selected. In the first iteration of step 300, the code analyzer may select the first statement in the code to execute when the code is called. In one or more embodiments, in the first iteration of step 300, for the first statement, a set of one or more affected variables that are affected by the set of one or more source variables is obtained (see description of step 206 above). In subsequent iterations of step 300, the code analyzer may select statements according to the order in which they appear in the code (e.g., based on the memory locations corresponding to the statements).

If it is determined in step 302 that the statement is a receiver statement (e.g., as specified in the target analysis), then the following step 304 is performed. Otherwise, if it is determined in step 302 that the statement is not a receiver statement, then the following step 312 is performed.

In step 304, the set of affected variables is modified using the statement (see description of step 210 above). In one or more embodiments, the code analyzer adds each unmodified receiver variable of the statement that is not already in the set of affected variables to the set of affected variables. The receiver variable may be a variable of a statement that receives the source value. In one or more embodiments, when the source value is modified (e.g., sanitized or decrypted) before the receiver variable reads the source value, the receiver variable is not added to the set of affected variables. For example, when the target analysis is a contamination analysis, the contaminated source values may be purged. Alternatively, the confidential source value may be decrypted when the target analysis is an escape analysis.

In step 306, defects corresponding to each unmodified receiver variable are reported (see description of step 212 above).

In step 308, each unmodified receiver variable is added to the set of source variables. That is, each unmodified receiver variable may, in turn, be used as a source variable that may affect (e.g., transmit a source value to) a variable in the statement selected in a subsequent iteration of step 300 above. In one or more embodiments, the code analyzer reconfigures the target analysis to specify that the receiver statements may include query data extraction statements (e.g., SQL select statements) in addition to query language data manipulation statements (e.g., insert or update statements). For example, a receiver variable in a query language data extraction statement may read a source value using one of the variables in the set of source variables (e.g., from a column).

In step 310, one or more edges are added to the trajectory graph corresponding to each unmodified receiver variable. In one or more embodiments, each edge connects one of the variables in the source variable set and an unmodified receiver variable. In one or more embodiments, the code analyzer adds an edge between each variable in the set of source variables and each unmodified receiver variable because the code analyzer tracks the aggregate dependencies of the set of affected variables on the set of source variables. In one or more embodiments, the defect reported in step 306 above corresponds to a path through the trackmap. For example, a path may include a series of edges connecting a series of nodes representing a series of affected variables that are affected (e.g., contaminated) by a source value. The report may include a path corresponding to the bug (e.g., to enable a developer to understand the flow of source values through variables and statements of the code).

If it is determined in step 312 that there are additional statements in the code, then step 300 above is performed again to select another (e.g., next) statement in the code.

FIG. 3B shows a flow diagram in accordance with one or more embodiments of the invention. The flow chart depicts a process for line-preserving dataflow analysis. One or more of the steps in fig. 3B may be performed by a component discussed above with reference to fig. 1A, e.g., code analyzer (104) of computer system (100). In one or more embodiments of the invention, one or more of the steps shown in fig. 3B may be omitted, repeated, and/or performed in parallel, or in a different order than that shown in fig. 3B. Accordingly, the scope of the present invention should not be considered limited to the particular arrangement of steps shown in FIG. 3B.

Initially, in step 352, it is determined that the source variable receives a source value from a source function specified by the target analysis (see description of step 202 above).

In step 354, it is determined that the source sentence written the source value to a cell in a column in the table using the source variable (see description of step 204 above).

In step 356, for the receiver statement, the set of affected variables affected by the source variable is obtained (see description of steps 206 and 208 above). The receiver statement may read the source value into a receiver variable that includes an identifier of the cell. For example, the identifier of the cell may include a column identifier and a row identifier.

In step 358, for the receiver statement, an abstract state is obtained that assigns an abstract value to each affected variable (see description of step 206 above).

In step 360, a modified set of affected variables affected by the source variable is generated by adding the receiver variable to the set of affected variables (see the description of steps 210 and 304 above). In one or more embodiments, the cell identifier represents a receiver variable that has been affected by the source value.

In step 362, the abstract state is modified using the receiver statement. In one or more embodiments, the abstraction values assigned to the affected variables are based on the abstraction values assigned to the set of source variables. The set of source variables may include the source variables of step 352 above. For example, the code analyzer may generate an abstract value for each affected variable (e.g., using a constraint solver) using an aggregation constraint represented by the abstract values assigned to the set of source variables.

In step 364, a defect at the receiver statement is reported (see description of step 212 above).

The line-preserving dataflow analysis described in fig. 3B is efficient for the following reasons. Although the source variable and the affected variable correspond to cells of the table, the cells may be abstract cells that are assigned abstract values, thereby limiting the number of cells and the overall size of the table. For example, each variable of each statement may correspond to an abstract unit cell. The abstract values assigned to the cells are accurate relative to the accuracy of the constraint solving and abstract interpretation algorithms used by the code analyzer.

FIG. 3C shows a flow diagram in accordance with one or more embodiments of the invention. The flow chart depicts the processing of the row retention dependency analysis. One or more of the steps in fig. 3C may be performed by a component discussed above with reference to fig. 1A, e.g., code analyzer (104) of computer system (100). In one or more embodiments of the invention, one or more of the steps shown in fig. 3C may be omitted, repeated, and/or performed in parallel, or in a different order than that shown in fig. 3C. Accordingly, the scope of the present invention should not be considered limited to the particular arrangement of steps shown in FIG. 3C.

Initially, in step 370, for a statement in code, a set of variable dependencies is obtained, each variable dependency comprising a pair of variables. Each variable dependency may include an independent variable and a dependent variable. In one or more embodiments, the code analyzer tracks, for each variable dependency, the exact, individual dependency of the dependent variable on the corresponding independent variable. In contrast, the information flow analysis described in fig. 2, 3A, and 3B tracks the aggregate dependencies of the affected variable sets on the source variable sets. In one or more embodiments, one of the variables in the variable dependency corresponds to a cell in a column of the table. For example, a statement may write the value of a variable to a cell (e.g., insert, update, or delete data in a cell when the statement is a data manipulation statement). Alternatively, the statement may read the cell's value into a variable (e.g., when the statement is a data extraction statement, data is selected from the cell).

In step 372, an abstract state is obtained for the statement that assigns an abstract value to each variable in each variable dependency (see description of step 206 above).

In step 374, the variable dependency set is modified using the statement (see the description of step 210 and step 304 above). In one or more embodiments, new variable dependencies are added to the variable dependency set. For example, the dependent variable of the variable dependency may be a cell identifier of a cell whose value is written using the value of the independent variable of the variable dependency. Alternatively, variable dependencies may be removed from the variable dependency set (e.g., when a value is deleted from a cell or the cell itself is deleted).

In step 376, the abstract state is modified using the statement. In one or more embodiments, the code analyzer assigns abstract values to the dependent variables in each variable dependency based on the abstract values assigned to the independent variables in the variable dependency. For example, the abstraction value assigned to the independent variable may be used as a constraint on the abstraction value assigned to the dependent variable.

FIG. 3D shows a flow diagram in accordance with one or more embodiments of the invention. The flow diagram depicts a process for detecting a security breach by a persistent storage device. One or more of the steps in fig. 3D may be performed by a component discussed above with reference to fig. 1A, e.g., code analyzer (104) of computer system (100). In one or more embodiments of the invention, one or more of the steps shown in fig. 3D may be omitted, repeated, and/or performed in parallel, or in a different order than that shown in fig. 3D. Accordingly, the scope of the present invention should not be considered limited to the particular arrangement of steps shown in FIG. 3D.

Initially, in step 380, a component of code is obtained. For example, a component may be a method, class, or file of code.

If in step 382 it is determined that the size of the component is below the predetermined threshold, then the following step 384 is performed. Otherwise, if in step 382 it is determined that the size of the component is not below the predetermined threshold, then the following step 386 is performed.

In step 384, a retention analysis is performed on the component. For example, the line-retention analysis may be the line-retention information flow analysis described above in FIG. 3B. Alternatively, the row retention analysis may be the row retention dependency analysis described above in FIG. 3C.

In one or more embodiments, the code analyzer aborts the row reservation analysis of the component if a predetermined amount of time has elapsed during the performance of the row reservation analysis. For example, the code analyzer may switch to a line collapse analysis on the component after aborting the line retention analysis.

In step 386, a fold analysis is performed on the component. For example, the line folding analysis may be the line folding information flow analysis described above in FIG. 2.

If it is determined in step 388 that additional components are present in the code, then step 380 above is performed again to obtain another component in the code.

The following examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Fig. 4A, 4B, and 4C illustrate an implementation example in accordance with one or more embodiments of the invention. Fig. 4A compares the row-collapsed view and the row-reserved view of the credential table. The row-reserved view (400) (120 in fig. 1A) of the credential table after insertion shows that some values in the username (402), password (404), and default application (408) columns (122A, 122N in fig. 1A) are contaminated, while none of the values in the role (406) column are contaminated. If the code analyzer ((104) in FIG. 1A) abstracts the credential table into a single row during the row folding analysis, the result is (contaminated, uncontaminated, contaminated) because each column other than the role (406) column contains at least one contaminated value. That is, the row folding analysis employs an abstract view of the columns, such that when any cell in a column contains a dirty value, the entire column is considered dirty.

The row-reserved view (410) of the credential table after deletion shows the result of deleting the second row in the credential table. The second row already contains the only dirty value for the default application (408) column. However, deleting the second row does not change the row-folding view of the credential table because the row-folding analysis does not track a particular cell value. Alternatively, if the code analyzer has first applied a line retention analysis and then switched to a line collapse analysis, the result will be (contaminated, uncontaminated) because the second line with the contaminated default application (408) value is deleted before applying the line collapse analysis. Thus, FIG. 4A illustrates how the line retention analysis may produce more accurate results than the line folding analysis. Furthermore, switching from row-preserving analysis to row-folding analysis may produce more accurate results than using pure row-folding analysis.

FIGS. 4B and 4C illustrate the processing of statements during the line reservation and line collapse information flow analysis. Turning to FIG. 4B, a row-holding view (420) with a sanitized credential table illustrates both tainted and sanitized values. The first row of the row retention view (420) with the cleaned credential table is inserted as a result of the first INSERT statement in the code fragment (450) (110 in fig. 1A) of fig. 4C. The first INSERT statement INSERTs the values of variables v1, v2, v3, and v4, which are assigned the values Bob, default, Manager, and hr-applications, respectively. The variables v1, v3, and v4 are contained in a set of contaminated (e.g., source) variables (460) (152A, 152N in fig. 1B).

When processing the first INSERT statement, the code analyzer determines that the first set of affected variables (470) is affected by the set of tainted variables (460). That is, the code analyzer tracks the dependencies of the affected variable sets on the contaminated variable sets. In contrast, when the analysis is a dependency analysis, the code analyzer tracks the dependency of a particular dependent variable (e.g., the column identifier "critical. username") on an independent variable (e.g., the variable v 1). In this example, the pollution analysis ((134) in fig. 1A and 1B) specifies the relevant pollution function and receiver (e.g., security sensitive) statements. The receiver statements include all SQL data manipulation statements and data extraction statements.

In a line folding analysis, the first set of receiver variables (470) resulting from processing the first INSERT statement includes the column identifiers "critical. In contrast, in a row retention analysis, the first set of receiver variables (470) includes cell identifiers (e.g., column identifiers plus row identifiers) corresponding to cells inserted in the credential table. The code analyzer modifies the set of tainted variables (460) by adding the first set of receiver variables (470) to the set of tainted variables (460) in order to track a secondary SQL injection resulting from extracting tainted values from the credential table.

In the SELECT statement for the code snippet (450), the values in the first row of the credential table are read into the variables x1, x2, x3, and x 4. In processing the SELECT statement, the code analyzer determines that the second set of affected variables (480) (i.e., the variables x1, x3, and x4 of the SELECT statement) are affected by the values of the modified set of contaminated variables. For example, the modified set of contaminated variables includes a first set of receiver variables (470) (e.g., the column identifiers "critical. The code analyzer reports defects (i.e., dirty flow) at the SELECT statement that result from flowing from the dirty variables (460) (i.e., variables v1, v3, and v4) of the first INSERT statement to the second set of receiver variables (480). Defects are secondary defects that result from inserting a dirty value into a table and then extracting the dirty value from the table.

As a result of the second INSERT statement in the code fragment (450), a second row of the row retention view (420) with the cleaned credential table is inserted. The second INSERT statement purges the variable x4 before execution INSERTs into the credential table. Thus, the purging of the variable x4 is reflected in the second row. In contrast, the row-folded view (430) of the credential table lacks any information about the sanitized value.

The embodiments disclosed herein may be implemented on a computing system. Any combination of mobile devices, desktop computers, servers, routers, switches, embedded devices, or other types of hardware may be used. For example, as shown in fig. 5A, a computing system (500) may include one or more computer processors (502), non-persistent storage (504) (e.g., volatile memory, such as Random Access Memory (RAM), cache memory), persistent storage (506) (e.g., a hard disk, optical drive flash memory, such as a Compact Disc (CD) drive or Digital Versatile Disc (DVD) drive, etc.), a communication interface (512) (e.g., a bluetooth interface, an infrared interface, a network interface, an optical interface, etc.), and many other elements and functions.

The computer processor(s) (502) may be an integrated circuit for processing instructions. For example, the computer processor(s) may be one or more cores or micro-cores of a processor. The computing system (500) may also include one or more input devices (510), such as a touch screen, keyboard, mouse, microphone, touch pad, electronic pen, or any other type of input device.

The communication interface (512) may include a connection for connecting the computing system (500) to a network (not shown) (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) such as the internet, a mobile network, or any other type of network) and/or another device, such as another computing device.

Additionally, the computing system (500) may include one or more output devices (508), such as a screen (e.g., a Liquid Crystal Display (LCD), a plasma display, a touch screen, a Cathode Ray Tube (CRT), a display, a projector, or other display device), a printer, an external storage, or any other output device. One or more of the output devices may be the same or different than the input device(s). The input and output device(s) may be connected locally or remotely to computer processor(s) (502), non-persistent storage (504), and persistent storage (506). Many different types of computing systems exist, and the aforementioned input and output device(s) may take other forms.

Software instructions in the form of computer-readable program code to perform embodiments disclosed herein may be stored in whole or in part, temporarily or permanently, on a non-transitory computer-readable medium, such as a CD, DVD, storage device, floppy disk, tape, flash memory, physical memory, or any other computer-readable storage medium. In particular, the software instructions may correspond to computer-readable program code which, when executed by the processor(s), is configured to perform one or more embodiments disclosed herein.

The computing system (500) in fig. 5A may be connected to or part of a network. For example, as shown in fig. 5B, the network (520) may include a plurality of nodes (e.g., node X (522), node Y (524)). Each node may correspond to a computing system, such as the computing system shown in fig. 5A, or a combined set of nodes may correspond to the computing system shown in fig. 5A. For example, embodiments disclosed herein may be implemented on nodes of a distributed system that are connected to other nodes. By way of another example, the embodiments disclosed herein may be implemented on a distributed computing system having multiple nodes, where each portion disclosed herein may be located on a different node within the distributed computing system. Additionally, one or more elements of the aforementioned computing system (500) may be located at a remote location and connected to the other elements over a network.

Although not shown in fig. 5B, the nodes may correspond to blades in a server chassis that is connected to other nodes via a backplane. By way of another example, a node may correspond to a server in a data center. By way of another example, a node may correspond to a computer processor or a micro-core of a computer processor having shared memory and/or resources.

Nodes (e.g., node X (522), node Y (524)) in the network (520) may be configured to provide services for the client device (526). For example, the node may be part of a cloud computing system. The node may comprise the following functionality: a request (526) is received from the client device and a response is transmitted to the client device (526). The client device (526) may be a computing system, such as the computing system shown in fig. 5A. Additionally, the client device (526) may include and/or perform all or a portion of one or more embodiments disclosed herein.

The computing system or group of computing systems described in fig. 5A and 5B may include functionality to perform various operations disclosed herein. For example, computing system(s) may perform communication between processes on the same or different systems. Various mechanisms employing some form of active or passive communication may facilitate data exchange between processes on the same device. Examples of communication between processes include, but are not limited to, the implementation of files, signals, sockets, message queues, pipes, semaphores, shared memory, message passing, and memory mapped files. Further details regarding several of these non-limiting examples are provided below.

Based on the client-server networking model, sockets can serve as interfaces or communication channel endpoints, enabling bidirectional data transfer between processes on the same device. First, following a client-server networking model, a server process (e.g., a process that provides data) may create a first socket object. Next, the server process binds the first socket object, thereby associating the first socket object with a unique name and/or address. After creating and binding the first socket object, the server process then waits and listens for incoming connection requests from one or more client processes (e.g., processes looking for data). At this point, when the client process wishes to obtain data from the server process, the client process will begin by creating a second socket object. The client process then proceeds to generate a connection request that includes at least the second socket object and the unique name and/or address associated with the first socket object. The client process then transmits the connection request to the server process. Depending on availability, the server process may accept the connection request, establish a communication channel with the client process, or the server process, busy handling other operations, may queue the connection request in a buffer until the server process is ready. The established connection informs the client process that communication can begin. In response, the client process may generate a data request specifying data that the client process wishes to obtain. The data request is then transmitted to the server process. Upon receiving a request for data, the server process analyzes the request and gathers the requested data. Finally, the server process then generates a reply that includes at least the requested data and transmits the reply to the client process. Data may be more generally transmitted as datagrams or character streams (e.g., bytes).

Shared memory refers to the allocation of virtual memory space to demonstrate a mechanism by which data may be communicated and/or accessed by multiple processes. In implementing shared memory, the initialization process first creates sharable segments in persistent or non-persistent storage. After creation, the initialization process installs the sharable segment and then maps the sharable segment into the address space associated with the initialization process. After installation, the initialization process continues to identify and grant access permission to one or more authorized processes, which may also write data to or read data from the shareable segment. Changes made by one process to data in a shareable segment may immediately affect other processes that are also linked to the shareable segment. In addition, when one of the authorized processes accesses a shareable fragment, the shareable fragment maps to the address space of that authorized process. Often only an authorized process, other than the initialization process, can install the shareable piece at any given time.

Other techniques may be used to share data (such as the various data described in this application) between processes without departing from the scope of the invention. These processes may be part of the same or different applications and may execute on the same or different computing systems.

The computing system in fig. 5A may be implemented and/or connected to a data repository. For example, one type of data repository is a database. A database is a collection of information configured to simplify data retrieval, modification, reorganization, and deletion. A database management system (DBMS) is a software application that provides a user with an interface to define, create, query, update, or manage a database.

A user or software application may submit a statement or query to the DBMS. Then, the DBMS interprets the statement. The statement may be a select statement, an update statement, a create statement, a delete statement, etc. that requests information. Also, a statement may include a parameter(s), identifier(s), condition(s) (comparison operator), function (e.g., join, fully join, count, average, etc.), sort (e.g., ascending, descending), or other that specifies a data or data container (database, table, record, column, view, etc.). The DBMS may execute the statement. For example, the DBMS may access a memory buffer, a reference or index to a file to read, write, delete, or any combination thereof, in response to the statement. The DBMS may load data from persistent or non-persistent storage and perform computations in response to a query. The DBMS may return the result(s) to the user or software application.

The above description of functions presents only a few examples of functions performed by the computing system of fig. 5A and the nodes and/or client devices in fig. 5B. Other functions may be performed using one or more of the embodiments disclosed herein.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.

29页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种篡改验证方法及装置

Detecting secondary security vulnerabilities by modeling information flow in persistent storage

相关技术

网友询问留言