Bandwidth distribution determination and program optimization methods, devices and equipment

文档序号:1921444 发布日期:2021-12-03 浏览:17次 中文

阅读说明:本技术 带宽分布确定、程序优化方法及装置、设备 (Bandwidth distribution determination and program optimization methods, devices and equipment ) 是由 崔莫磊 于 2020-05-29 设计创作,主要内容包括:本发明提供一种带宽分布确定、程序优化方法及装置、设备,带宽分布确定方法包括:针对指定程序中每一待调用的目标函数,从函数树中确定对应该目标函数的目标节点,并分别确定第一内存访问量和第二内存访问量,依据第一内存访问量和第二内存访问量确定第三内存访问量,将第三内存访问量记录至目标节点;第一内存访问量为在开始调用该目标函数时内存的已访问量,第二内存访问量为在该目标函数被调用完成时内存的已访问量,第三内存访问量为调用该目标函数所需的内存访问量;在各目标函数均被调用完成时,根据函数树中各节点记录的第三内存访问量确定带宽分布信息。本发明可更准确地确定调用指定程序中各目标函数所需的带宽分布信息。(The invention provides a method, a device and equipment for determining bandwidth distribution and optimizing programs, wherein the method for determining bandwidth distribution comprises the following steps: determining a target node corresponding to each target function to be called in a designated program from a function tree, respectively determining a first memory access amount and a second memory access amount, determining a third memory access amount according to the first memory access amount and the second memory access amount, and recording the third memory access amount to the target node; the first memory access amount is the accessed amount of the memory when the target function is started to be called, the second memory access amount is the accessed amount of the memory when the target function is called completely, and the third memory access amount is the memory access amount required by calling the target function; and when all the target functions are called, determining bandwidth distribution information according to the third memory access quantity recorded by each node in the function tree. The invention can more accurately determine the bandwidth distribution information required by calling each target function in the designated program.)

1. A method for determining bandwidth distribution, the method comprising:

determining a target node corresponding to each target function to be called in a designated program from a generated function tree, respectively determining a first memory access amount and a second memory access amount, determining a third memory access amount corresponding to the target function according to the first memory access amount and the second memory access amount, and recording the third memory access amount to the target node; the first memory access amount is the accessed amount of the memory when the target function is started to be called, the second memory access amount is the accessed amount of the memory when the target function is called completely, and the third memory access amount is the memory access amount required by calling the target function;

and when all the target functions of the specified program are called, determining bandwidth distribution information required by calling all the target functions in the specified program according to the third memory access amount recorded by all the nodes in the function tree.

2. The method of claim 1, wherein determining the third memory access amount corresponding to the objective function according to the first memory access amount and the second memory access amount comprises:

calculating the difference value of the second memory access amount and the first memory access amount;

and determining the third memory access amount according to the difference value.

3. The method according to claim 1, wherein determining bandwidth distribution information required for calling each target function in a given program according to the third memory access amount recorded in each node in the function tree comprises:

calculating the ratio of the third memory access amount recorded by the node to the third memory access amount recorded by the root node aiming at each node in the function tree, and determining the bandwidth ratio information required by calling the target function corresponding to the node according to the ratio;

and determining the bandwidth distribution information based on the determined bandwidth ratio information required for calling each target function.

4. The bandwidth distribution determination method of claim 1, wherein the function tree is generated by:

determining a root node of the function tree, wherein the root node corresponds to a designated target function in the designated program;

and generating child nodes of the function tree according to the child functions of the target functions corresponding to the root nodes in the designated program.

5. The method of determining bandwidth distribution according to claim 4, wherein generating child nodes of the function tree according to child functions of the objective function corresponding to the root node in the specified program comprises:

taking the root node as a current searching node, checking whether a subfunction of the target function corresponding to the current searching node exists in the designated program, and if so, executing the following steps for each subfunction:

if the current searching node does not have a father node, generating a child node of the current searching node in the function tree, wherein the child node corresponds to the child function, when the number of layers of the current access node of the function tree does not reach a preset number of layers, determining the child node as the current searching node, and returning to check whether a child function of a target function corresponding to the current searching node exists in the designated program;

if the current searching node has a father node, when the sub-function is different from the target function corresponding to any node on the appointed path, and the appointed path is the path from the current searching node to the root node on the function tree, generating the sub-node of the current searching node in the function tree, wherein the sub-node corresponds to the sub-function, when the number of layers of the current access node of the function tree does not reach the preset number of layers, determining the sub-node as the current searching node, and returning to check whether the designated program has the sub-function of the target function corresponding to the current searching node.

6. The bandwidth distribution determination method according to claim 1 or 5, wherein after generating the function tree, the method further comprises:

allocating node numbers to the nodes in the function tree according to a set number principle; the set numbering principle is as follows: the node numbers of the nodes on the same layer are continuous, and the node number of the appointed node on each layer is continuous with the node number of the father node; the designated node on each level refers to the node on that level with the smallest node number.

7. The bandwidth distribution determination method of claim 6,

the function tree is composed of an array, each element in the array corresponds to each node in the function tree, and every two nodes with continuous node numbers are adjacent array elements in the array.

8. A method for program optimization, the method comprising:

determining a function to be optimized from a specified program according to bandwidth distribution information required by calling each target function in the specified program;

determining a target optimization mode for optimizing the function to be optimized according to the grammatical structure of the function to be optimized, wherein the target optimization mode is used for optimizing the memory access condition when the function to be optimized is called;

and optimizing the function to be optimized by adopting the target optimization mode.

9. The program optimization method of claim 8, further comprising, prior to the method: establishing a corresponding relation between a grammar structure and an optimization mode;

the determining a target optimization mode for optimizing the function to be optimized according to the grammar structure of the function to be optimized includes:

and finding out an optimization mode corresponding to the grammar structure of the function to be optimized from the corresponding relation between the set grammar structure and the optimization mode, and determining the found optimization mode as the target optimization mode.

10. The program optimization method of claim 9, wherein the establishing of the correspondence between the syntactic structure and the optimization method comprises:

determining a target sample function from the obtained sample function groups; the sample function group at least comprises a first sample function and a plurality of second sample functions, and the second sample functions are obtained by optimizing the first sample function in different optimization modes; the target sample function is a second sample function which is selected from second sample functions included in the sample function group and meets a set requirement, and the set requirement means that bandwidth information required by calling is minimum; the bandwidth information refers to the memory access amount in unit time;

and aiming at each sample function group, establishing the corresponding relation according to the grammatical structure of each sample function in the sample function group and the optimization mode adopted by the target sample function in the sample function group.

11. The program optimization method of claim 8, wherein the bandwidth distribution information comprises: calling bandwidth information required by each target function in a designated program, wherein the bandwidth information refers to the memory access amount in unit time;

the determining a function to be optimized from the designated program according to the bandwidth distribution information required by calling each target function in the designated program includes:

acquiring the current available bandwidth of a target platform for running the specified program;

and aiming at each objective function, comparing the bandwidth information corresponding to the objective function in the bandwidth distribution information with the current available bandwidth, and determining the objective function as a function to be optimized when the bandwidth information corresponding to the objective function in the bandwidth distribution information is larger than the current available bandwidth.

12. A bandwidth distribution determining apparatus, characterized in that the apparatus comprises:

the memory access quantity determining module is used for determining a target node corresponding to each target function to be called in the designated program from the generated function tree, respectively determining a first memory access quantity and a second memory access quantity, determining a third memory access quantity corresponding to the target function according to the first memory access quantity and the second memory access quantity, and recording the third memory access quantity to the target node; the first memory access amount is the accessed amount of the memory when the target function is started to be called, the second memory access amount is the accessed amount of the memory when the target function is called completely, and the third memory access amount is the memory access amount required by calling the target function;

and the bandwidth distribution information determining module is used for determining the bandwidth distribution information required by calling each target function in the designated program according to the third memory access amount recorded by each node in the function tree when each target function of the designated program is called.

13. A program optimization device, comprising:

the function to be optimized determining module is used for determining a function to be optimized from the appointed program according to bandwidth distribution information required by calling each target function in the appointed program;

the optimization method determination module is used for determining a target optimization method for optimizing the function to be optimized according to the grammatical structure of the function to be optimized, wherein the target optimization method is used for optimizing the memory access condition when the function to be optimized is called;

and the function optimization module is used for optimizing the function to be optimized by adopting the target optimization mode.

14. An electronic device comprising a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor, when executing the program, implements the bandwidth distribution determination method of any one of claims 1 to 7 or the program optimization method of any one of claims 8 to 11.

Technical Field

The invention relates to the technical field of computers, in particular to a method, a device and equipment for determining bandwidth distribution and optimizing programs.

Background

In computer systems, especially embedded computer systems, the bus bandwidth of the system is usually very limited, and some programs, such as programs implementing bandwidth-intensive algorithms such as deep learning, image codec, etc., are often limited due to insufficient bus bandwidth of the system. The bus bandwidth of the system is inseparable from the access density of the memory, if a program accesses the memory in a large amount during operation, the bus bandwidth is excessively occupied, and if the bus bandwidth of the system is insufficient, the program can operate slowly and the like. Therefore, it is necessary to optimize the program, and before that, it is necessary to determine the bandwidth distribution information required by each function in the calling program.

In a typical manner, the bandwidth distribution information is determined according to the number of memory accesses at the program logic level. However, since the speed of a processor (e.g., a central processing unit CPU) is much higher than that of a memory in a computer system, if the processor needs to access the memory each time a data read operation is performed, the speed of the processor is slow due to the memory access. Therefore, a Cache (high-speed buffer memory) is introduced into the current computer system, and the Cache caches part of data in the memory, so that the number of times that a processor directly reads and writes the memory when a program runs is reduced, and the occupation of the program on the system bandwidth can be greatly reduced.

In other words, the occurrence of the Cache can make the number of times of memory access actually generated by the program during the operation less than the number of times of memory access on the logic level. Cache in different computers is often different in structure, size, and/or Cache policy, which may directly affect Cache Miss (Miss) rate, that is, Cache in different computers has different effects on the number of times of memory access of a program, and therefore, determining bandwidth distribution information according to the number of times of memory access on a program logic level results in inaccuracy.

For example, when a processor calls a certain function of a program, it needs to read and write a certain small range of space in the memory continuously for many times on a logical level, that is, the memory access amount on the logical level is large, but due to the presence of the Cache, the memory access amount that may be actually generated may be small.

Disclosure of Invention

In view of this, the present invention provides a method, an apparatus, and a device for determining bandwidth distribution and optimizing a program, which can more accurately determine bandwidth distribution information required for invoking each target function in a designated program.

A first aspect of the present invention provides a bandwidth distribution determining method, including:

determining a target node corresponding to each target function to be called in a designated program from a generated function tree, respectively determining a first memory access amount and a second memory access amount, determining a third memory access amount corresponding to the target function according to the first memory access amount and the second memory access amount, and recording the third memory access amount to the target node; the first memory access amount is the accessed amount of the memory when the target function is started to be called, the second memory access amount is the accessed amount of the memory when the target function is called completely, and the third memory access amount is the memory access amount required by calling the target function;

and when all the target functions of the specified program are called, determining bandwidth distribution information required by calling all the target functions in the specified program according to the third memory access amount recorded by all the nodes in the function tree.

According to an embodiment of the present invention, determining a third memory access amount corresponding to the objective function according to the first memory access amount and the second memory access amount includes:

calculating the difference value of the second memory access amount and the first memory access amount;

and determining the third memory access amount according to the difference value.

According to an embodiment of the present invention, determining bandwidth distribution information required for calling each target function in a specified program according to a third memory access amount recorded by each node in the function tree includes:

calculating the ratio of the third memory access amount recorded by the node to the third memory access amount recorded by the root node aiming at each node in the function tree, and determining the bandwidth ratio information required by calling the target function corresponding to the node according to the ratio;

and determining the bandwidth distribution information based on the determined bandwidth ratio information required for calling each target function.

According to one embodiment of the invention, the function tree is generated by:

determining a root node of the function tree, wherein the root node corresponds to a designated target function in the designated program;

and generating child nodes of the function tree according to the child functions of the target functions corresponding to the root nodes in the designated program.

According to an embodiment of the present invention, generating child nodes of the function tree according to child functions of the target function corresponding to the root node in the specified program includes:

taking the root node as a current searching node, checking whether a subfunction of the target function corresponding to the current searching node exists in the designated program, and if so, executing the following steps for each subfunction:

if the current searching node does not have a father node, generating a child node of the current searching node in the function tree, wherein the child node corresponds to the child function, when the number of layers of the current access node of the function tree does not reach a preset number of layers, determining the child node as the current searching node, and returning to check whether a child function of a target function corresponding to the current searching node exists in the designated program;

if the current searching node has a father node, when the sub-function is different from the target function corresponding to any node on the appointed path, and the appointed path is the path from the current searching node to the root node on the function tree, generating the sub-node of the current searching node in the function tree, wherein the sub-node corresponds to the sub-function, when the number of layers of the current access node of the function tree does not reach the preset number of layers, determining the sub-node as the current searching node, and returning to check whether the designated program has the sub-function of the target function corresponding to the current searching node.

According to an embodiment of the invention, after generating the function tree, the method further comprises:

allocating node numbers to the nodes in the function tree according to a set number principle; the set numbering principle is as follows: the node numbers of the nodes on the same layer are continuous, and the node number of the appointed node on each layer is continuous with the node number of the father node; the designated node on each level refers to the node on that level with the smallest node number.

In accordance with one embodiment of the present invention,

the function tree is composed of an array, each element in the array corresponds to each node in the function tree, and every two nodes with continuous node numbers are adjacent array elements in the array.

A second aspect of the present invention provides a program optimization method, including:

determining a function to be optimized from a specified program according to bandwidth distribution information required by calling each target function in the specified program;

determining a target optimization mode for optimizing the function to be optimized according to the grammatical structure of the function to be optimized, wherein the target optimization mode is used for optimizing the memory access condition when the function to be optimized is called;

and optimizing the function to be optimized by adopting the target optimization mode.

According to an embodiment of the invention, the method further comprises, before: establishing a corresponding relation between a grammar structure and an optimization mode;

the determining a target optimization mode for optimizing the function to be optimized according to the grammar structure of the function to be optimized includes:

and finding out an optimization mode corresponding to the grammar structure of the function to be optimized from the corresponding relation between the set grammar structure and the optimization mode, and determining the found optimization mode as the target optimization mode.

According to an embodiment of the present invention, the establishing of the corresponding relationship between the syntactic structure and the optimization method includes:

determining a target sample function from the obtained sample function groups; the sample function group at least comprises a first sample function and a plurality of second sample functions, and the second sample functions are obtained by optimizing the first sample function in different optimization modes; the target sample function is a second sample function which is selected from second sample functions included in the sample function group and meets a set requirement, and the set requirement means that bandwidth information required by calling is minimum; the bandwidth information refers to the memory access amount in unit time;

and aiming at each sample function group, establishing the corresponding relation according to the grammatical structure of each sample function in the sample function group and the optimization mode adopted by the target sample function in the sample function group.

According to an embodiment of the present invention, the bandwidth distribution information includes: calling bandwidth information required by each target function in a designated program, wherein the bandwidth information refers to the memory access amount in unit time;

the determining a function to be optimized from the designated program according to the bandwidth distribution information required by calling each target function in the designated program includes:

acquiring the current available bandwidth of a target platform for running the specified program;

and aiming at each objective function, comparing the bandwidth information corresponding to the objective function in the bandwidth distribution information with the current available bandwidth, and determining the objective function as a function to be optimized when the bandwidth information corresponding to the objective function in the bandwidth distribution information is larger than the current available bandwidth.

A third aspect of the present invention provides a bandwidth distribution determination apparatus, including:

the memory access quantity determining module is used for determining a target node corresponding to each target function to be called in the designated program from the generated function tree, respectively determining a first memory access quantity and a second memory access quantity, determining a third memory access quantity corresponding to the target function according to the first memory access quantity and the second memory access quantity, and recording the third memory access quantity to the target node; the first memory access amount is the accessed amount of the memory when the target function is started to be called, the second memory access amount is the accessed amount of the memory when the target function is called completely, and the third memory access amount is the memory access amount required by calling the target function;

and the bandwidth distribution information determining module is used for determining the bandwidth distribution information required by calling each target function in the designated program according to the third memory access amount recorded by each node in the function tree when each target function of the designated program is called.

According to an embodiment of the present invention, when the memory access amount determining module determines the third memory access amount corresponding to the objective function according to the first memory access amount and the second memory access amount, the memory access amount determining module is specifically configured to:

calculating the difference value of the second memory access amount and the first memory access amount;

and determining the third memory access amount according to the difference value.

According to an embodiment of the present invention, when the bandwidth distribution information determining module determines bandwidth distribution information required for calling each target function in the designated program according to the third memory access amount recorded by each node in the function tree, the bandwidth distribution information determining module is specifically configured to:

calculating the ratio of the third memory access amount recorded by the node to the third memory access amount recorded by the root node aiming at each node in the function tree, and determining the bandwidth ratio information required by calling the target function corresponding to the node according to the ratio;

and determining the bandwidth distribution information based on the determined bandwidth ratio information required for calling each target function.

According to one embodiment of the invention, the function tree is generated by:

a root node determining module, configured to determine a root node of the function tree, where the root node corresponds to a target function specified in the specified program;

and the child node generation module is used for generating child nodes of the function tree according to the child functions of the target functions corresponding to the root nodes in the designated program.

According to an embodiment of the present invention, when the child node generating module generates the child node of the function tree according to the child function of the target function corresponding to the root node in the designated program, the child node generating module is specifically configured to:

taking the root node as a current searching node, checking whether a subfunction of the target function corresponding to the current searching node exists in the designated program, and if so, executing the following steps for each subfunction:

if the current searching node does not have a father node, generating a child node of the current searching node in the function tree, wherein the child node corresponds to the child function, when the number of layers of the current access node of the function tree does not reach a preset number of layers, determining the child node as the current searching node, and returning to check whether a child function of a target function corresponding to the current searching node exists in the designated program;

if the current searching node has a father node, when the sub-function is different from the target function corresponding to any node on the appointed path, and the appointed path is the path from the current searching node to the root node on the function tree, generating the sub-node of the current searching node in the function tree, wherein the sub-node corresponds to the sub-function, when the number of layers of the current access node of the function tree does not reach the preset number of layers, determining the sub-node as the current searching node, and returning to check whether the designated program has the sub-function of the target function corresponding to the current searching node.

According to an embodiment of the invention, the apparatus further comprises:

the numbering module is used for distributing node numbers to the nodes in the function tree according to a set numbering principle; the set numbering principle is as follows: the node numbers of the nodes on the same layer are continuous, and the node number of the appointed node on each layer is continuous with the node number of the father node; the designated node on each level refers to the node on that level with the smallest node number.

In accordance with one embodiment of the present invention,

the function tree is composed of an array, each element in the array corresponds to each node in the function tree, and every two nodes with continuous node numbers are adjacent array elements in the array.

A fourth aspect of the present invention provides a program optimization apparatus, including:

the function to be optimized determining module is used for determining a function to be optimized from the appointed program according to bandwidth distribution information required by calling each target function in the appointed program;

the optimization method determination module is used for determining a target optimization method for optimizing the function to be optimized according to the grammatical structure of the function to be optimized, wherein the target optimization method is used for optimizing the memory access condition when the function to be optimized is called;

and the function optimization module is used for optimizing the function to be optimized by adopting the target optimization mode.

According to an embodiment of the invention, the apparatus further comprises: the corresponding relation establishing module is used for establishing the corresponding relation between the grammar structure and the optimization mode;

when the optimization mode determining module determines the target optimization mode for optimizing the function to be optimized according to the syntactic structure of the function to be optimized, the optimization mode determining module is specifically configured to:

and finding out an optimization mode corresponding to the grammar structure of the function to be optimized from the corresponding relation between the set grammar structure and the optimization mode, and determining the found optimization mode as the target optimization mode.

According to an embodiment of the present invention, when the corresponding relationship establishing module establishes the corresponding relationship between the syntax structure and the optimization mode, the corresponding relationship establishing module is specifically configured to:

determining a target sample function from the obtained sample function groups; the sample function group at least comprises a first sample function and a plurality of second sample functions, and the second sample functions are obtained by optimizing the first sample function in different optimization modes; the target sample function is a second sample function which is selected from second sample functions included in the sample function group and meets a set requirement, and the set requirement means that bandwidth information required by calling is minimum; the bandwidth information refers to the memory access amount in unit time;

and aiming at each sample function group, establishing the corresponding relation according to the grammatical structure of each sample function in the sample function group and the optimization mode adopted by the target sample function in the sample function group.

According to an embodiment of the present invention, the bandwidth distribution information includes: calling bandwidth information required by each target function in a designated program, wherein the bandwidth information refers to the memory access amount in unit time;

the function to be optimized determining module is specifically configured to, when determining a function to be optimized from the designated program according to bandwidth distribution information required for calling each target function in the designated program:

acquiring the current available bandwidth of a target platform for running the specified program;

and aiming at each objective function, comparing the bandwidth information corresponding to the objective function in the bandwidth distribution information with the current available bandwidth, and determining the objective function as a function to be optimized when the bandwidth information corresponding to the objective function in the bandwidth distribution information is larger than the current available bandwidth.

A fifth aspect of the present invention provides an electronic device, comprising a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor implements the bandwidth distribution determining method or the program optimizing method according to the foregoing embodiments when executing the program.

A sixth aspect of the present invention provides a machine-readable storage medium on which a program is stored, which when executed by a processor, implements the bandwidth distribution determining method or the program optimizing method as described in the foregoing embodiments.

The embodiment of the invention has the following beneficial effects:

in the embodiment of the present invention, a first memory access amount and a second memory access amount corresponding to each target function in a specified program may be determined, where the first memory access amount is an accessed amount of a memory when the target function starts to be called, and the second memory access amount is an accessed amount of a memory when the target function is completely called, so that a third memory access amount corresponding to the target function, that is, a memory access amount required for calling the target function, may be determined according to the first memory access amount and the second memory access amount corresponding to the target function, and the third memory access amount corresponding to each target function is recorded in a function tree, and bandwidth distribution information required for calling each target function in the specified program may be determined according to the third memory access amount recorded by each node in the function tree, and since the determined third memory access amount is determined according to the accessed amounts of the memories before and after the target function is called, the accessed memory amounts are actually generated, so that the third memory access amount is derived from the actual memory access amount compared with the information of the logic level, and the bandwidth distribution information determined according to the third memory access amounts is more accurate in result compared with the bandwidth distribution information determined according to the memory access times of the program logic level.

Drawings

Fig. 1 is a schematic flow chart of a bandwidth distribution determining method according to an embodiment of the present invention;

FIG. 2 is a diagram of a function tree according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for program optimization according to an embodiment of the invention;

fig. 4 is a block diagram of a bandwidth distribution determining apparatus according to an embodiment of the present invention;

FIG. 5 is a block diagram of a program optimization apparatus according to an embodiment of the present invention;

fig. 6 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one type of device from another. For example, a first device may also be referred to as a second device, and similarly, a second device may also be referred to as a first device, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

In order to make the description of the present invention clearer and more concise, some technical terms in the present invention are explained below:

bus bandwidth: in computer bus systems, the total amount of data transfer that is allowed to occur per unit time.

Bandwidth hot spot: in the program running process, the part with more bandwidth is occupied, namely the part with more memory access.

And (4) Cache: the Cache memory is positioned between the CPU and the main memory and used for caching the data read and written in the main memory, and the speed of the modern CPU is far higher than that of the memory, so that the Cache can obviously reduce the waiting time of the CPU and improve the operating efficiency of the computer.

Cache Miss: cache Miss rate, when a processor runs a program, the processor caches the memory frequently accessed and expected to be accessed by the CPU to the Cache so as to improve the running speed of the CPU, and when the data required to be read by the CPU is not cached in the Cache, the data is required to be read from the memory, which is called Cache Miss.

Function: in computer terminology, a function refers to a piece of independent code in a program that has a particular function.

Subfunction: a function called in the function.

Abstract syntax tree: in computer science, an Abstract Syntax Tree (AST), or simply Syntax Tree (Syntax Tree), is an Abstract representation of the Syntax structure of source code. It represents the syntactic structure of the programming language in the form of a tree, each node on the tree representing a structure in the source code.

In the embodiment of the invention, the bandwidth distribution condition of the program in operation can be determined, namely the bandwidth distribution information required by each target function in the calling program. Furthermore, the program can be optimized based on the bandwidth distribution information required by each objective function in the calling program.

The program that needs to analyze the bandwidth distribution information is referred to as a designated program, and all functions in the designated program may be set as target functions, or a plurality of functions may be determined from the designated program as target functions, which is not limited specifically. The target function is a function that specifies bandwidth information required in the program to determine the call to be called.

The following describes the bandwidth distribution determining method according to the embodiment of the present invention more specifically, but should not be limited thereto.

In one embodiment, referring to fig. 1, the bandwidth distribution determination method may include the steps of:

s100: determining a target node corresponding to each target function to be called in a designated program from a generated function tree, respectively determining a first memory access amount and a second memory access amount, determining a third memory access amount corresponding to the target function according to the first memory access amount and the second memory access amount, and recording the third memory access amount to the target node; the first memory access amount is the accessed amount of the memory when the target function is started to be called, the second memory access amount is the accessed amount of the memory when the target function is called completely, and the third memory access amount is the memory access amount required by calling the target function;

s200: and when all the target functions of the specified program are called, determining bandwidth distribution information required by calling all the target functions in the specified program according to the third memory access amount recorded by all the nodes in the function tree.

In the embodiment of the present invention, an execution subject of the bandwidth distribution determining method may be an electronic device, and the electronic device may be a computer device.

The method described above may be implemented by the electronic device by running a specific program, and before that, analysis code for determining bandwidth distribution information may be added to the specific program, for example, by:

determining the position information of each function in a designated program; finding out target functions in the appointed program according to the position information of each function and the function tree, adding a corresponding first analysis code for each target function, and adding a second analysis code at the appointed position of the initial program; wherein, the first analysis code is used to implement the step S100 when being executed by the electronic device, and the second analysis code is used to implement the step S200 when being executed by the electronic device.

Optionally, when determining the location information of each function in the designated program, the function may be identified according to the function feature, and when identifying the function, the location information of the function in the initial program may be determined. The functional characteristics may include, for example: characters in the function representing the return value such as "return", and parameter list features of the function, etc. The specified program may be composed of 1 or more source code files, in which case, the location information of the function may include a file identifier of the source code file where the function is located, and an offset (offset with respect to the first line or the specified line of the source code file) of the function in the source code file, which is only an example and not a limitation.

Optionally, in order to determine the location information of each function as accurately as possible, before determining the location information of each function in the designated program, filtering processing may be performed on the designated program to remove interference information in the designated program.

The interference information may include information that does not conform to the grammar rule or may affect the function identification (such as information having function characteristics and the like that are easily mistaken for functions), for example, may include comments, preprocessed code segments, and the like, wherein when the preprocessed code segments are removed, such as macro definitions and the like, the preprocessed code segments may be pre-compiled and expanded to avoid affecting the program operation.

Optionally, under the condition that the designated program may be composed of 1 or more source code files, when the corresponding first analysis code is added to each target function, the function may be searched from the last function of each source code file in the direction of the first function, and when a node corresponding to the searched function exists in the function tree, the function is determined to be the target function, and the corresponding first analysis code is added to the target function.

Wherein the first analysis code may be divided into a first code segment and a second code segment. The first code segment may be located before the corresponding objective function and after a last function of the objective function, and is configured to obtain a first memory access amount; the second code segment may be located after the corresponding objective function and before a next function of the objective function, and configured to obtain a second memory access amount, determine a third memory access amount according to the second memory access amount and the first memory access amount, and record the third memory access amount to the objective node.

The second analysis code may be added at the end of the designated program, or at the position of the code when all the target functions are called, and is used to determine bandwidth distribution information required for calling each target function in the designated program according to the third memory access amount recorded by each node in the function tree when each target function of the designated program is called.

Optionally, when the first analysis code is added, a corresponding analysis code identifier may be added to the first analysis code, so as to delete the corresponding first analysis code subsequently. When the second analysis code is added, a corresponding analysis code identification can be added to the second analysis code, so that the second analysis code can be deleted later. Correspondingly, when a code deleting instruction is received, the code corresponding to the analysis code identifier in the designated program can be deleted according to the analysis code identifier.

After the designated program is determined based on the above manner, the electronic device may run the designated program to perform steps S100-S200 to determine bandwidth distribution information required to call each target function in the designated program.

In step S100, for each target function to be called in the designated program, a target node corresponding to the target function is determined from the generated function tree, a first memory access amount and a second memory access amount are respectively determined, a third memory access amount corresponding to the target function is determined according to the first memory access amount and the second memory access amount, and the third memory access amount is recorded to the target node.

A function tree may be generated before step S100 is performed, where each node in the function tree has a corresponding objective function for recording relevant information of the corresponding objective function. Optionally, the relationship between the nodes in the function tree may represent the calling relationship of each target function in the specified program, for example, in the function tree, one node a1 is a child node of another node a2, and the target function corresponding to the node a1 is a child function of the target function corresponding to the node a 2.

Optionally, when the target node corresponding to the target function is determined in the generated function tree, the node identifier, such as the node number, of the target node corresponding to the target function may be obtained, and the target node corresponding to the node number may be found from the function tree according to the node number. The node number may be recorded in a first analysis code corresponding to the objective function, so that when the first analysis code is run, the recorded corresponding node number is obtained.

Of course, the method for determining the target node corresponding to the target function is only an example, and there are other actual methods, for example, when the relationship between the nodes in the function tree can represent the call relationship of each target function in the specified program, the corresponding target node can be found from the function tree according to the call sequence of the target function, which is not limited specifically.

In some devices, a memory read-write register is provided in a processor, and a package interface may be provided for the memory read-write register, so that the memory read-write register can be used by multiple platforms. The wrapper interface may be used to provide memory access times.

The first memory access amount and the second memory access amount can be determined according to the memory access times recorded in the memory read-write register of the device. Specifically, when the target function is started to be called, the first memory access frequency may be obtained from the memory read-write register, and the first memory access frequency is multiplied by the set bit width to obtain the first memory access amount. When the target function is called, the second memory access times can be obtained from the memory read-write register, and the second memory access times are multiplied by the set bit width to obtain a second memory access amount.

Of course, the first memory access amount and the second memory access amount may also be determined in other manners, and the specific manner is not limited, and other manners capable of obtaining the memory access amount are all applicable.

The first memory access amount is the accessed amount of the memory when the target function is started to be called, namely, the accessed amount of the memory after the last function of the target function is called and before the target function is called. The second memory access amount is the amount of memory accessed when the target function is called, that is, the amount of memory accessed after the target function is called and before the next function of the target function is called.

Since one is the memory access amount when the target function is being called and the other is the memory access amount when the target function is being called, the memory access amount required for calling the target function, that is, the third memory access amount, can be determined according to the first memory access amount and the second memory access amount.

In the embodiment of the invention, the memory access comprises reading data from the memory and writing data in the memory. The memory access amount refers to the amount of data generated by reading and writing the memory, and the memory access amount required for calling the target function is the amount of data generated by reading and writing the memory in the whole process of calling the target function.

And after the third memory access amount is determined, recording the third memory access amount into a target node corresponding to the target function in the function tree.

In step S200, when all the target functions of the specified program are called, bandwidth distribution information required for calling each target function in the specified program is determined according to the third memory access amount recorded by each node in the function tree.

When all the target functions of the specified program are called, in the manner in step S100, the third memory access amount corresponding to each target function of the specified program is determined, and the third memory access amount corresponding to each target function is recorded in each node of the function tree.

Therefore, the bandwidth distribution information required for calling each target function in the designated program can be determined according to the third memory access amount recorded by each node in the function tree.

The bandwidth distribution information may include, for example: and calling bandwidth information required by each target function in the designated program, wherein the bandwidth information refers to the memory access amount in unit time. And/or, the bandwidth distribution information may include: bandwidth information required for calling each target function is bandwidth ratio information in bandwidth information required for calling a specified program. Of course, only two kinds of information are exemplified here, and the method is not particularly limited as long as the bandwidth distribution required for calling each target function in the designated program can be determined.

After the bandwidth distribution information is determined, a bandwidth hotspot in the program can be determined, a function to be optimized in the program can be further determined, and the function to be optimized is optimized to realize optimization of the program, so that the access amount of the program to the memory during running is reduced.

For example, when the bandwidth distribution information includes bandwidth information corresponding to each objective function, N objective functions corresponding to the maximum bandwidth information can be found as functions to be optimized, where N is greater than or equal to 1; and/or under the condition that the bandwidth distribution information comprises bandwidth ratio information corresponding to each objective function, finding out N objective functions corresponding to the maximum bandwidth ratio information as functions to be optimized. Of course, the specific manner of determining the function to be optimized is not limited.

After the optimization of the designated program, the above steps S100-S200 may be re-executed to determine the bandwidth distribution information of the designated program to verify the optimization result. Optionally, when the optimization result meets the requirement, the optimized program may be obtained according to deletion of an analysis code with an analysis code identifier in the designated program, that is, the first analysis code and the second analysis code. Of course, if the optimization result does not meet the requirement, the optimization can be continued, and the above steps S100 to S200 are continued with respect to the optimized program until the optimization result meets the requirement.

Under the condition of executing the steps S100-S200 for multiple times, the function tree can record the third memory access amount required by calling the corresponding target function each time; furthermore, the called times of the target function can be accumulated each time the third memory access amount is recorded.

Besides the determination of the bandwidth distribution information, the second analysis code may also determine other operation information during operation, for example, the called times of each target function, the maximum third memory access amount, an average value of the third memory access amounts, a memory access amount required by one operation process in which the time consumed by the main function in the designated program is the maximum, a bandwidth information distribution condition required by each target function called in the operation process, the called times of each target function, the maximum third memory access amount, an average value of the third memory access amounts, and the like during all operation processes of the designated operation program.

Optionally, the bandwidth distribution information and the above running information may be output in the form of a report. Specifically, in order to make the bandwidth distribution information more intuitively presented, the names of all the objective functions may still be listed in the report in the form of a tree structure, and the bandwidth distribution information composed of the objective functions may be correspondingly listed, which, of course, is not particularly limited.

In the embodiment of the present invention, a first memory access amount and a second memory access amount corresponding to each target function in a specified program may be determined, where the first memory access amount is an accessed amount of a memory when the target function starts to be called, and the second memory access amount is an accessed amount of a memory when the target function is completely called, so that a third memory access amount corresponding to the target function, that is, a memory access amount required for calling the target function, may be determined according to the first memory access amount and the second memory access amount corresponding to the target function, and the third memory access amount corresponding to each target function is recorded in a function tree, and bandwidth distribution information required for calling each target function in the specified program may be determined according to the third memory access amount recorded by each node in the function tree, and since the determined third memory access amount is determined according to the accessed amounts of the memories before and after the target function is called, the accessed memory amounts are actually generated, so that the third memory access amount is derived from the actual memory access amount compared with the information of the logic level, and the bandwidth distribution information determined according to the third memory access amounts is more accurate in result compared with the bandwidth distribution information determined according to the memory access times of the program logic level.

In an embodiment, in step S100, determining a third memory access amount corresponding to the objective function according to the first memory access amount and the second memory access amount may include the following steps:

s101: calculating the difference value of the second memory access amount and the first memory access amount;

s102: and determining the third memory access amount according to the difference value.

The difference between the second memory access amount and the first memory access amount is the change of the memory access amount before and after the target function is called, and the memory access amount required by calling the target function, namely, the third memory access amount, can be determined according to the difference.

Specifically, the difference between the second memory access amount and the first memory access amount may be directly determined as the third memory access amount; alternatively, the difference may be subjected to a setting operation, such as calculating a ratio of the difference to a set value (i.e., performing normalization), and the operation result is determined as the third memory access amount, which is not limited specifically.

In an embodiment, in step S200, determining bandwidth distribution information required for calling each target function in the designated program according to the third memory access amount recorded by each node in the function tree may include:

s201: calculating the ratio of the third memory access amount recorded by the node to the third memory access amount recorded by the root node aiming at each node except the root node in the function tree, and determining the bandwidth ratio information required by calling the target function corresponding to the node according to the ratio;

s202: and determining the bandwidth distribution information based on the determined bandwidth ratio information required for calling each target function.

Taking the target function corresponding to the root node as the main function of the designated program as an example, the third memory access amount required for calling the main function is actually the total required memory access amount for calling the whole designated program, so the proportion of the total required memory access amount to the memory access amount required by each target function can be calculated by taking the third memory access amount as a reference.

In the same way, when the target function corresponding to the root node is a function other than the main function in the designated program, the proportion of the memory access amount required by each target function in the memory access amount required by the target function corresponding to the root node can be calculated by using the third memory access amount recorded by the root node as a reference.

Therefore, in this embodiment, a ratio between the third memory access amount recorded by each node and the third memory access amount recorded by the root node is calculated, and bandwidth proportion information required for invoking the target function corresponding to the node is determined according to the ratio. The bandwidth proportion information indicates the proportion of bandwidth information required by calling the target function corresponding to the node in bandwidth information required by calling the target function corresponding to the root node, and in the case that the target function corresponding to the root node is a main function of the specified program, the bandwidth proportion information indicates the proportion of the bandwidth information required by calling the target function in the bandwidth information required by calling the specified program, and the bandwidth information refers to the memory access amount in unit time.

Optionally, for example, the ratio may be directly determined as bandwidth ratio information required for invoking the target function corresponding to the node; or, after a certain operation is performed on the ratio, the ratio is used as the bandwidth ratio information, for example, the ratio is multiplied by a specified value, for example, 100, so that the bandwidth ratio information of the objective function corresponding to the root node is 100, the other bandwidth ratio information is less than 100, and the sum of the bandwidth ratios of the objective functions called by the objective function corresponding to the root node is less than or equal to 100, although the specific manner is not limited.

After determining the bandwidth proportion information required for invoking each objective function, the bandwidth distribution information may be determined based on each bandwidth proportion information, for example, the bandwidth proportion information required for each objective function may be used as the bandwidth distribution information. The objective function with the largest bandwidth ratio information is the bandwidth hotspot in the designated program.

In this case, when the bandwidth distribution information is output in the form of a report, the name of each objective function may be displayed in the form of a tree, and then the bandwidth ratio information corresponding to the objective function may be displayed at a position corresponding to the name of the objective function, so that the bandwidth distribution information is more clearly understood. Of course, if other information is needed in the report, such as the number of calls, etc., the information may also be displayed at the position corresponding to the name of the target function.

In the above embodiment of the present invention, a function tree is introduced, and the function tree can not only record the relevant information of the corresponding objective function, such as the third memory access amount, in the node, but also help to determine and present the bandwidth distribution information, and certainly has other uses.

The function tree may have been generated prior to step S100, and in one embodiment, the function tree may be generated by:

s300: determining a root node of the function tree, wherein the root node corresponds to a designated target function in the designated program;

s400: and generating child nodes of the function tree according to the child functions of the target functions corresponding to the root nodes in the designated program.

It is understood that steps S300 and S400 may be performed before step S100.

In this embodiment, an objective function may be specified in a specified program, and a root node of the function tree may be set to correspond to the specified objective function. For example, a main function specified in a specified program may be used as an objective function corresponding to a root node of a function tree. Of course, it is not limited to specify which function is specifically used as the target function corresponding to the root node, and may be determined according to optimization requirements.

Thereafter, child nodes of the function tree may be generated according to child functions of the target function corresponding to the root node in the designated program. For example, when the objective function has three sub-functions, three sub-nodes of the root node are generated in the function tree.

In one embodiment, in step S400, generating child nodes of the function tree according to the child functions of the target function corresponding to the root node in the designated program includes:

s401: taking the root node as a current searching node, checking whether a subfunction of the target function corresponding to the current searching node exists in the designated program, and if so, executing the following steps for each subfunction:

s402: if the current searching node does not have a father node, generating a child node of the current searching node in the function tree, wherein the child node corresponds to the child function, when the number of layers of the current access node of the function tree does not reach a preset number of layers, determining the child node as the current searching node, and returning to check whether a child function of a target function corresponding to the current searching node exists in the designated program;

s403: if the current searching node has a father node, when the sub-function is different from the target function corresponding to any node on the appointed path, and the appointed path is the path from the current searching node to the root node on the function tree, generating the sub-node of the current searching node in the function tree, wherein the sub-node corresponds to the sub-function, when the number of layers of the current access node of the function tree does not reach the preset number of layers, determining the sub-node as the current searching node, and returning to check whether the designated program has the sub-function of the target function corresponding to the current searching node.

In this embodiment, a search is started from an objective function corresponding to a root node, the root node is used as a current search node, and when a child function of the objective function corresponding to the current search node exists in a designated program, there are as many search paths as there are as many child functions, and for each child function, if there is no parent node in the current search node, that is, the current search node is the root node, a child node of the current search node can be generated in a function tree, where the child node corresponds to the child function; if the current searching node has a father node, and when the child function is different from the target function corresponding to any node on the appointed path, and the appointed path is the path from the current searching node to the root node on the function tree, generating the child node of the current searching node in the function tree, wherein the child node corresponds to the child function.

The target functions corresponding to the child function and any node on the specified path are different, that is, the target functions corresponding to the child function and the current search node and its ancestor node (including the parent node) are different. If the subfunction is the same as any target function corresponding to the current search node and the ancestor node thereof, it indicates that two same functions appear on the same search path, meaning that recursive call or cyclic call is generated, at this time, the node corresponding to the subfunction is not generated in the function tree, if there are other subfunctions, execution continues for other subfunctions, otherwise, the search is ended, and the required function tree is obtained.

In this embodiment, after generating the child node of the current search node in the function tree, it is further determined whether the number of layers of the current access node of the function tree reaches a preset number of layers, and if not, the child node is determined as the current search node, and it is checked whether a child function of the target function corresponding to the current search node exists in the designated program. Thus, the node layer number of the obtained function tree is a preset layer number. The number of the preset layers can be determined according to optimization requirements, and specific numerical values are not limited.

For example, referring to fig. 2, the correspondence between each node in the function tree and the function that can be generated in the above manner is as follows: f11 is an objective function specified in a specified program (e.g., a main function of the specified program), and corresponds to a root node of the function tree; f11 has three subfunctions, F12, F13 and F14, which respectively correspond to the three child nodes of the root node; f12 has two subfunctions, respectively F13 and F15, corresponding to the two child nodes of the node corresponding to F12, respectively, wherein F15 has two subfunctions, respectively F13 and F16, corresponding to the two child nodes of the node corresponding to F15; f14 has two subfunctions, F17 and F13, respectively, corresponding to the two children of the node corresponding to F14. Of course, the function tree shown in fig. 2 is only an example for easy understanding, and the calling relationship of the target function in the actual program may be more complicated.

In the above manner, by performing static analysis on the designated program, the function in the designated program is searched according to the call relationship among the functions in the designated program, and the node corresponding to the searched function is established, wherein if a loop structure (a loop call relationship may occur when the function is recursively called or circularly called) occurs in the search path, the loop structure is interrupted and the search of the path where the loop structure is located is ended, and finally, a function tree not including the loop structure can be generated, thereby avoiding adverse effects of recursive calls and cyclic calls on the bandwidth distribution determination process.

In one embodiment, after generating the function tree, the method further comprises the steps of:

s500: allocating node numbers to the nodes in the function tree according to a set number principle; the set numbering principle is as follows: the node numbers of the nodes on the same layer are continuous, and the node number of the appointed node on each layer is continuous with the node number of the father node; the designated node on each level refers to the node on that level with the smallest node number.

In the above numbering principle, numbering is started from the shallower layer of the tree, and the next layer is entered after all nodes on the same layer are numbered. Thus, in the function tree, the node number of the root node is the smallest, the node numbers at each layer are consecutive, and the smallest node number in each layer is consecutive to the number of the parent node of the node.

For example, with continued reference to fig. 2, F11 is a specified objective function (such as a main function of a program), corresponding to a root node of the function tree, the node number of the root node being 1; the F11 has three subfunctions, namely F12, F13 and F14 which respectively correspond to three child nodes of the root node, and the node numbers of the three child nodes are respectively 2, 3 and 4; the F12 has two subfunctions, namely F13 and F15 respectively, and two child nodes corresponding to the node corresponding to the F12, wherein the node numbers of the two child nodes are respectively 5 and 6; the F14 has two subfunctions, namely F17 and F13 respectively, which correspond to two child nodes of the node corresponding to the F14 respectively, and the node numbers of the two child nodes are 7 and 8 respectively; f15 has two subfunctions, F13 and F16, corresponding to the two children of the node corresponding to F15, the nodes of which are numbered 9 and 10.

Through the numbering mode, node numbers of all child nodes of the nodes are continuous, namely the child nodes are sequentially arranged, so that when one child node record is finished, the node number of the next child node can be determined through adding the node number of the child node and the number interval value (such as 1), the whole tree does not need to be searched again for determination, and the problem that in the bandwidth distribution determination process, the larger influence is brought by more memory access amount generated by the search function tree is avoided.

For example, in fig. 2, after the target function F13 corresponding to the node with the node number 5 is called, when the target function F15 needs to be called, the node numbers 5 and 1 may be accumulated to obtain the node number 6, and the corresponding node is found according to the node number 6, so that the third memory access amount of the target function F15 may be recorded in the node with the node number 6.

In one embodiment, the function tree is composed of an array, each element in the array corresponds to each node in the function tree, and every two nodes with consecutive node numbers are adjacent array elements in the array.

Thus, a root node with node number 1 corresponds to the 1 st array element in the array, a node with node number 2 corresponds to the 2 nd array element in the array, a node with node number 3 corresponds to the 3 rd array element in the array, and so on.

In this embodiment, the function tree is implemented by an array, and by combining the numbering method, when a child node of a node is recorded, a next array element, that is, a next child node, can be found by adding 1 to the subscript of the array, and the cost for searching the tree is small.

Optionally, a child node may be added below each distal node in the function tree, a node number corresponding to the child node may be set to be a set number such as 0 for indicating that determination of bandwidth information is prohibited, and in a subsequent bandwidth distribution determination process, if a node with a node number of 0 is found for a certain function, statistics on bandwidth information of the function may be directly skipped, that is, a first memory access amount and a second memory access amount corresponding to the function may not be obtained, a third memory access amount or a record thereof may not be counted, and a problem of repeated statistics on bandwidth information caused by recursive call may be avoided.

Optionally, when the designated program does not need to determine the bandwidth distribution, the node numbers in the function tree may be all modified to 0, so that the problem of performing bandwidth information statistics when the function is called externally may be avoided.

The above is the content of the bandwidth distribution determining method in the embodiment of the present invention, and bandwidth distribution information required for calling each target function in the designated program can be determined.

As described in the background art, in a computer system, although hardware has been improved, a Cache (high-speed buffer memory) is introduced, and partial data in a memory is cached by the Cache, so that the number of times that a processor directly reads and writes the memory during the operation of a program is reduced, and bandwidth information required when the program is called can be greatly reduced.

However, the above manner of adding the Cache to the hardware is limited to improve the situation that the required bandwidth information is too much due to a large amount of memory accesses in the program running process, and sometimes, the required occupied bandwidth information still needs to be reduced by optimizing the program itself.

For this reason, the present invention further provides a program optimization method, which may implement optimization based on the bandwidth distribution information required for invoking each objective function in the specified program determined in the foregoing embodiment, and of course, this is not a limitation, and may also implement optimization based on the bandwidth distribution information determined by the manner, or implement optimization based on other information.

Although there are some ways to optimize a program, for example, an optimization function is built in a compiler, the compiler optimizes the program during compiling the code to improve the running efficiency of the code (i.e. to complete the same function with the least instruction cycle) or reduce the code volume (i.e. to complete the same function with the shortest code length). However, this optimization is directed to CPU performance and does not optimize the memory access for programs.

Furthermore, theoretically, since the compiler cannot know the data during the program running during the compilation, the method includes: cache size, CPU frequency, DDR channel, etc. affect the data used by the memory bandwidth, so it is very difficult to optimize the program in the aspect of memory access by using the compiler without changing the working mode of the existing compiler.

In order to solve the problem of excessive bandwidth information required by calling a program, the program can be optimized only by a manual mode at present, the conditions of optimization required by the program for different platforms are different, for the platform with extremely limited bus bandwidth of a system, the aspects of optimization required are more, and the manual optimization mode is slow in speed and low in efficiency, so that the problem cannot be solved well.

In the embodiment of the invention, the function to be optimized can be determined from the designated program according to the bandwidth distribution information required by calling each target function in the designated program, the target optimization mode for optimizing the function to be optimized can be automatically determined according to the syntactic structure of the function to be optimized, and the function to be optimized is optimized by adopting the target optimization mode, so that the optimization of the memory access condition when the function to be optimized is called is realized, the bandwidth information required by calling the function to be optimized is reduced, the bandwidth information required by the designated program is optimized on the whole, the occupation of the system bandwidth can be reduced, the whole process does not need artificial participation, the automation degree is high, and the optimization efficiency of the program is high.

The embodiment of the invention can be applied to the occasions needing to optimize programs, particularly to the programs with higher bandwidth occupation, such as image processing, machine learning, video coding and decoding and the like deployed on an embedded system with extremely limited bus bandwidth, can help algorithm developers to quickly locate the bandwidth hot spots in the programs and optimize the parts as the bandwidth hot spots, effectively reduces the occupation of the system bandwidth when the programs run, and improves the running efficiency of the programs.

The program optimization method according to the embodiment of the present invention is described in more detail below, but should not be limited thereto.

In one embodiment, referring to FIG. 3, a program optimization method may include the steps of:

t100: determining a function to be optimized from a specified program according to bandwidth distribution information required by calling each target function in the specified program;

t200: determining a target optimization mode for optimizing the function to be optimized according to the grammatical structure of the function to be optimized, wherein the target optimization mode is used for optimizing the memory access condition when the function to be optimized is called;

t300: and optimizing the function to be optimized by adopting the target optimization mode.

In the embodiment of the present invention, the execution main body of the program optimization method may be an electronic device, and the electronic device may be a computer device.

The above steps T100-T300 may be performed before the specified program is compiled, in other words, the optimized specified program of the present invention is an uncompiled program, or may be referred to as a source code file, and may include a program written in a high-level language such as C, C + +, or may be a program written in another language, which is not listed here.

The whole program optimization process is irrelevant to a compiler, cannot be limited by the function of the compiler, can be suitable for any processing platform, does not depend on a specific hardware platform or the compiler, has strong universality, particularly has very high application value in the field of embedded system development with extremely limited bus bandwidth, does not need a user to have bandwidth optimization knowledge, is completely automatic in program optimization, and is simple to use.

In step T100, a function to be optimized is determined from the designated program according to bandwidth distribution information required for calling each target function in the designated program.

The bandwidth distribution information may include, for example, bandwidth information required for each target function in the designated program to be called, bandwidth ratio information, and/or the like. Of course, the bandwidth distribution information may also include other information, and is not limited in particular.

The bandwidth information refers to the memory access amount in a unit time, that is, the ratio of the memory access amount required for calling the target function to the calling time of the target function. For example, on the basis of determining the third memory access amount corresponding to each target function in the foregoing embodiment, the bandwidth information required by the target function may be determined according to the third memory access amount corresponding to the target function and the call duration of the target function, and specifically, a ratio of the third memory access amount to the call duration is used as the bandwidth information.

The bandwidth proportion information may be a ratio of bandwidth information required to call a target function to bandwidth information required to call a target function (such as a main function) specified in the specification program. For example, in the foregoing embodiment, on the basis that the third memory access amount corresponding to each objective function is recorded in the function tree, the bandwidth ratio information required by each objective function may be determined according to the third memory access amount recorded by each node and the third memory access amount recorded by the root node, and specifically, a ratio of the third memory access amount recorded by the node to the third memory access amount recorded by the root node may be used as the bandwidth information required by the objective function corresponding to the node.

Taking the example that the bandwidth distribution information includes bandwidth proportion information required by each objective function, when determining the function to be optimized from the designated program according to the bandwidth distribution information required by each objective function in the calling designated program, the objective function with the bandwidth proportion information larger than the set proportion can be found out as the function to be optimized. Of course, this is by way of example only and should not be taken as limiting, and other approaches are possible.

In step T200, a target optimization manner for optimizing the function to be optimized is determined according to the syntax structure of the function to be optimized, where the target optimization manner is used to optimize a memory access condition when the function to be optimized is called.

The optimization method can comprise the following steps: the code (such as but not limited to code for realizing loop) in the function for reading and writing a large piece of data in the memory is optimized to divide the large piece of data into a plurality of smaller data blocks and read and write the data block as a unit.

For example, in a certain cyclic function, when image matrix data is processed in units of pixels, since one frame of image matrix data is large and cannot be stored in the Cache, the amount of one frame of image matrix data needs to be read from the memory every time, which is very large. After optimization, a line of data can be used as a data block, only one data block needs to be read each time, the data block can be placed into the Cache, and can be acquired from the Cache when the data block is needed subsequently, the memory does not need to be accessed, and bandwidth information needed by calling the function can be greatly reduced.

The size of the data blocks to be cut may be different in different optimization modes, for example, in one optimization mode, the size of each data block to be cut is 24x 24; in another optimization, each chunk of the split is 32x32 in size.

The optimization method can further comprise the following steps: the inner layer cycle is interchanged with the outer layer cycle, i.e. the original inner layer cycle becomes the outer layer cycle and the original outer layer cycle becomes the inner layer cycle.

Continuing to take the image matrix data as an example, the original cyclic code is to read a column of data first and then acquire the data from the read column of data by rows for processing, so that all row data (namely all data blocks) of the whole image rectangular data are traversed in each cycle, and the Cache cannot be well utilized; after optimization, one line of data can be read first, and then the data is obtained from the read line of data according to columns for processing, so that one data block serving as one line of data can be read in an outer-layer cycle each time and is placed in the Cache, and the data only needs to be obtained from the Cache during memory circulation.

Of course, the above is only an example of the optimization method, and other optimization methods may be used, which are not listed here and are not limited specifically.

In this embodiment, a target optimization mode for optimizing the function to be optimized may be automatically matched according to the grammatical structure of the function to be optimized, where the target optimization mode is suitable for optimizing the function using the grammatical structure, and may optimize a memory access condition when the function to be optimized is called.

In step T300, the target optimization method is used to optimize the function to be optimized.

The target optimization mode may be used to optimize the memory access condition when the function to be optimized is called, for example, the number of accesses to the memory may be reduced, and the amount of data to be read and written to the memory may be reduced. Therefore, the function to be optimized can be optimized in a target optimization mode, so that bandwidth information required by calling the function to be optimized is reduced.

When a plurality of functions to be optimized exist in the designated program, the steps T200 to T300 may be executed for each function to be optimized to optimize each function to be optimized, thereby completing the optimization of the entire designated program.

In one embodiment, before the step T100 of the method, a step T110 is further included: and establishing a corresponding relation between the grammar structure and the optimization mode.

In the correspondence relationship between the syntax structure and the optimization manner, a plurality of optimization manners may be included, and each optimization manner may correspond to one or more syntax structures. Of course, the syntax structure in the correspondence relationship is of course a syntax structure that meets the requirements of the code.

Generally, the same function can be realized by functions with different syntax structures. Therefore, preferably, in the correspondence relationship, each of the preferred manners may correspond to a plurality of grammar structures, where the plurality of grammar structures corresponding to each of the optimization manners are from a plurality of functions that implement the same function, and the optimization manner may be an optimal optimization manner determined from optimization manners for optimizing the plurality of functions. Thus, the functions using these syntax structures can be optimized in the same optimal way to optimize the memory access conditions maximally.

The specific optimization method can refer to the description in the foregoing embodiments, and is not repeated herein.

Optionally, if the designated program needs to be subsequently run on the target platform, when determining the correspondence between the syntax structure and the optimization mode, the plurality of functions optimized by the different optimization modes may be run on the target platform, so as to determine the optimal optimization mode from the different optimization modes of the plurality of functions.

After the corresponding relationship between the syntactic structure and the optimization mode is determined, the corresponding relationship may be set in the target platform as an optimization configuration file corresponding to the target platform.

Accordingly, in step T200, the determining a target optimization manner for optimizing the function to be optimized according to the syntax structure of the function to be optimized may include the following steps:

t201: and finding out an optimization mode corresponding to the grammar structure of the function to be optimized from the corresponding relation between the set grammar structure and the optimization mode, and determining the found optimization mode as the target optimization mode.

After determining the to-be-optimized function to be optimized, finding out an optimization mode corresponding to the grammar structure of the to-be-optimized function from the corresponding relation between the set grammar structure and the optimization mode, and determining the found optimization mode as a target optimization mode.

When searching, grammar matching can be carried out on the grammar structure in the corresponding relation between the grammar structure and the optimization mode and the function to be optimized, and when matching is carried out, the optimization mode corresponding to the matched grammar structure is the target optimization mode.

In an embodiment, in step T110, the establishing of the corresponding relationship between the syntax structure and the optimization mode may include the following steps:

t111: determining a target sample function from the obtained sample function groups; the sample function group at least comprises a first sample function and a plurality of second sample functions, and the second sample functions are obtained by optimizing the first sample function in different optimization modes; the target sample function is a second sample function which is selected from second sample functions included in the sample function group and meets a set requirement, and the set requirement means that bandwidth information required by calling is minimum; the bandwidth information refers to the memory access amount in unit time;

t112: and aiming at each sample function group, establishing the corresponding relation according to the grammatical structure of each sample function in the sample function group and the optimization mode adopted by the target sample function in the sample function group.

A plurality of sets of sample functions may be obtained, where each set of sample functions includes one sample function that can be optimized (of course, there may be a plurality of sample functions, and the specific number is not limited), that is, an unoptimized first sample function, and may further include a plurality of second sample functions, where the second sample functions are obtained by optimizing the first sample function in different optimization manners. It will be appreciated that each sample function may be a function that may be executed.

Since the second sample function is optimized for the first sample function and is not rewritten in function, the second sample function implements the same function but has a different syntax structure with respect to the first sample function.

The determination method of the first sample function may be the same as the determination method of the function to be optimized, that is, a function that can be optimized may be found from some programs as the first sample function. The optimization of the first sample function may be implemented manually or by other means, and is not limited in particular.

In each acquired sample function group, except for the first sample function, each second sample function has a corresponding optimization mode, and the optimization modes can be called as equivalent transformation, namely the function is not changed after the function is optimized. However, the optimization method corresponding to each second sample function is not optimal, and therefore, the optimal optimization method needs to be determined.

First, a target sample function needs to be determined from the acquired sample function sets.

For each sample function group, the second sample functions in the group of sample functions are sequentially run on the target platform, and the running time consumption and the memory access amount required by calling of each second sample function are recorded. On the basis of recording the consumed time and the generated memory access amount of each second sample function, the bandwidth information of each second sample function, namely the memory access amount in unit time, can be calculated according to the memory access amount and the consumed time, and the second sample function with the minimum bandwidth information can be selected as the target sample function.

The correspondence may then be established based on each set of determined target sample functions.

After determining the target sample function in each sample function set, the corresponding relationship may be established for each sample function set according to the syntax structure of each sample function in the sample function set and the optimization manner adopted by the target sample function in the sample function set. Specifically, the syntactic structure of each sample function in each sample function set is stored in correspondence with the optimization mode adopted by the target sample function in the sample function set.

Thus, the correspondence relationship includes a plurality of optimization manners, each optimization manner corresponds to a plurality of grammar structures, and the optimization manner is an optimal optimization manner that adopts a function of the corresponding grammar structures.

Optionally, when the correspondence relationship is established, the syntax structure may be represented in other forms, for example, the syntax structure of a function may be represented in the form of an abstract syntax tree, so that each syntax structure has a corresponding abstract syntax tree. And abstracting the syntactic structure of each sample function into an abstract syntactic tree, and establishing a corresponding relation between the abstract syntactic tree of each group of sample functions and the optimization mode corresponding to the target sample functions in the group.

Correspondingly, when the target optimization mode for optimizing the function to be optimized is determined according to the syntax structure of the function to be optimized, the target abstract syntax tree matched with the syntax structure of the function to be optimized can be found from the corresponding relationship, and the optimization mode corresponding to the target abstract syntax tree in the corresponding relationship is determined as the target optimization mode.

Optionally, after the abstract syntax tree is established, the abstract syntax tree may be vectorized to obtain the syntax feature vector, where the correspondence is a correspondence between the syntax feature vector and the optimization mode. And the grammar structure is expressed by adopting the grammar feature vector, so that the matching of the grammar structure is facilitated.

In one embodiment, the bandwidth distribution information includes: and calling bandwidth information required by each target function in the designated program, wherein the bandwidth information refers to the memory access amount in unit time.

As mentioned above, the bandwidth information may also be considered as the ratio of the amount of memory access required to call the target function to the call duration of the target function.

In step T100, the determining a function to be optimized from the designated program according to bandwidth distribution information required for calling each target function in the designated program may include the following steps:

t101: acquiring the current available bandwidth of a target platform for running the specified program;

t102: and aiming at each objective function, comparing the bandwidth information corresponding to the objective function in the bandwidth distribution information with the current available bandwidth, and determining the objective function as a function to be optimized when the bandwidth information corresponding to the objective function in the bandwidth distribution information is larger than the current available bandwidth.

In the case that the current available bandwidth of the target platform is large, the designated program can be allowed to occupy more bandwidth, and in the case that the current available bandwidth of the target platform is limited, the designated program can only be allowed to occupy limited bandwidth. The target platform is a platform for running the specified program.

Therefore, the function to be optimized needs to be determined from the designated program according to the bandwidth distribution information required for calling each target function in the designated program and the current available bandwidth of the target platform.

The determination of the current available bandwidth may be determined according to the total bandwidth of the target platform, for example, the product of the total memory bandwidth and a preset threshold may be calculated to obtain the available total bandwidth. The preset threshold may be determined according to a bandwidth ratio that can be reserved by the target platform for the designated program, and is not particularly limited.

Of course, the total bandwidth of the target platforms may vary, allowing for different amounts of bandwidth to be occupied by a given program. Then, in practice, there may be some functions that need not be optimized for a platform with a large total bandwidth, but need to be optimized for a platform with a small total bandwidth, and the embodiments of the present invention are particularly suitable for a platform with a small total bandwidth.

The total bandwidth of the target platform can be obtained by analyzing the performance bottleneck of the target platform, and the maximum bandwidth of the target platform can be tested by using a group of performance analysis use cases, for example, a performance analysis use case with smaller required bandwidth information runs on the target platform firstly, if the performance analysis use case can run normally, the performance analysis use case with larger required bandwidth information runs on the target platform continuously, if the performance analysis use case can still run normally, the performance analysis use case with larger required bandwidth information runs on the target platform continuously, and so on, until a certain performance analysis use case can not run normally on the target platform, the bandwidth information required by the running of the previous performance analysis use case is the total bandwidth of the target platform. Of course, the above method is only an example, and actually, not limited to this, and other methods for determining the total bandwidth of the target platform are also applicable.

After the current available bandwidth of the target platform is determined, the bandwidth information corresponding to each objective function in the bandwidth distribution information and the current available bandwidth can be compared, if the comparison result is greater than the current available bandwidth, the corresponding objective function cannot normally run on the target platform, and the objective function with the comparison result greater than the current available bandwidth is determined as the function to be optimized.

The present invention also provides a bandwidth distribution determining apparatus, and referring to fig. 4, the bandwidth distribution determining apparatus 100 includes:

a memory access amount determining module 101, configured to determine, for each target function to be called in a specified program, a target node corresponding to the target function from a generated function tree, and determine a first memory access amount and a second memory access amount respectively, determine a third memory access amount corresponding to the target function according to the first memory access amount and the second memory access amount, and record the third memory access amount to the target node; the first memory access amount is the accessed amount of the memory when the target function is started to be called, the second memory access amount is the accessed amount of the memory when the target function is called completely, and the third memory access amount is the memory access amount required by calling the target function;

and a bandwidth distribution information determining module 102, configured to determine, when all the target functions of the specified program are called, bandwidth distribution information required for calling all the target functions in the specified program according to the third memory access amount recorded by each node in the function tree.

In an embodiment, when the memory access amount determining module determines the third memory access amount corresponding to the objective function according to the first memory access amount and the second memory access amount, the memory access amount determining module is specifically configured to:

calculating the difference value of the second memory access amount and the first memory access amount;

and determining the third memory access amount according to the difference value.

In an embodiment, when the bandwidth distribution information determining module determines, according to the third memory access amount recorded in each node in the function tree, bandwidth distribution information required for calling each target function in the designated program, the bandwidth distribution information determining module is specifically configured to:

calculating the ratio of the third memory access amount recorded by the node to the third memory access amount recorded by the root node aiming at each node in the function tree, and determining the bandwidth ratio information required by calling the target function corresponding to the node according to the ratio;

and determining the bandwidth distribution information based on the determined bandwidth ratio information required for calling each target function.

In one embodiment, the function tree is generated by:

a root node determining module, configured to determine a root node of the function tree, where the root node corresponds to a target function specified in the specified program;

and the child node generation module is used for generating child nodes of the function tree according to the child functions of the target functions corresponding to the root nodes in the designated program.

In an embodiment, when the child node generating module generates the child node of the function tree according to the child function of the target function corresponding to the root node in the designated program, the child node generating module is specifically configured to:

taking the root node as a current searching node, checking whether a subfunction of the target function corresponding to the current searching node exists in the designated program, and if so, executing the following steps for each subfunction:

if the current searching node does not have a father node, generating a child node of the current searching node in the function tree, wherein the child node corresponds to the child function, when the number of layers of the current access node of the function tree does not reach a preset number of layers, determining the child node as the current searching node, and returning to check whether a child function of a target function corresponding to the current searching node exists in the designated program;

if the current searching node has a father node, when the sub-function is different from the target function corresponding to any node on the appointed path, and the appointed path is the path from the current searching node to the root node on the function tree, generating the sub-node of the current searching node in the function tree, wherein the sub-node corresponds to the sub-function, when the number of layers of the current access node of the function tree does not reach the preset number of layers, determining the sub-node as the current searching node, and returning to check whether the designated program has the sub-function of the target function corresponding to the current searching node.

In one embodiment, the apparatus further comprises:

the numbering module is used for distributing node numbers to the nodes in the function tree according to a set numbering principle; the set numbering principle is as follows: the node numbers of the nodes on the same layer are continuous, and the node number of the appointed node on each layer is continuous with the node number of the father node; the designated node on each level refers to the node on that level with the smallest node number.

In one embodiment of the present invention,

the function tree is composed of an array, each element in the array corresponds to each node in the function tree, and every two nodes with continuous node numbers are adjacent array elements in the array.

The present invention also provides a program optimization apparatus, and referring to fig. 5, the program optimization apparatus 200 includes:

a function to be optimized determining module 201, configured to determine a function to be optimized from a specified program according to bandwidth distribution information required for invoking each target function in the specified program;

an optimization mode determining module 202, configured to determine, according to a syntax structure of the function to be optimized, a target optimization mode for optimizing the function to be optimized, where the target optimization mode is used to optimize a memory access condition when the function to be optimized is called;

and the function optimization module 203 is configured to optimize the function to be optimized by using the target optimization mode.

In one embodiment, the apparatus further comprises: the corresponding relation establishing module is used for establishing the corresponding relation between the grammar structure and the optimization mode;

when the optimization mode determining module determines the target optimization mode for optimizing the function to be optimized according to the syntactic structure of the function to be optimized, the optimization mode determining module is specifically configured to:

and finding out an optimization mode corresponding to the grammar structure of the function to be optimized from the corresponding relation between the set grammar structure and the optimization mode, and determining the found optimization mode as the target optimization mode.

In an embodiment, when the corresponding relationship establishing module establishes the corresponding relationship between the syntactic structure and the optimization method, the corresponding relationship establishing module is specifically configured to:

determining a target sample function from the obtained sample function groups; the sample function group at least comprises a first sample function and a plurality of second sample functions, and the second sample functions are obtained by optimizing the first sample function in different optimization modes; the target sample function is a second sample function which is selected from second sample functions included in the sample function group and meets a set requirement, and the set requirement means that bandwidth information required by calling is minimum; the bandwidth information refers to the memory access amount in unit time;

and aiming at each sample function group, establishing the corresponding relation according to the grammatical structure of each sample function in the sample function group and the optimization mode adopted by the target sample function in the sample function group.

In one embodiment, the bandwidth distribution information includes: calling bandwidth information required by each target function in a designated program, wherein the bandwidth information refers to the memory access amount in unit time;

the function to be optimized determining module is specifically configured to, when determining a function to be optimized from the designated program according to bandwidth distribution information required for calling each target function in the designated program:

acquiring the current available bandwidth of a target platform for running the specified program;

and aiming at each objective function, comparing the bandwidth information corresponding to the objective function in the bandwidth distribution information with the current available bandwidth, and determining the objective function as a function to be optimized when the bandwidth information corresponding to the objective function in the bandwidth distribution information is larger than the current available bandwidth.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts shown as units may or may not be physical units.

The invention also provides an electronic device, which comprises a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor implements the bandwidth distribution determining method or the program optimizing method as described in the foregoing embodiments when executing the program.

The embodiment of the bandwidth distribution determining device and the program optimizing device can be applied to electronic equipment. Taking a software implementation as an example, as a logical device, the device is formed by reading, by a processor of the electronic device where the device is located, a corresponding computer program instruction in the nonvolatile memory into the memory for operation. From a hardware aspect, as shown in fig. 6, fig. 6 is a hardware structure diagram of an electronic device where the bandwidth distribution determining apparatus 100 is located according to an exemplary embodiment of the present invention (a program optimizing apparatus is also the same and is not shown again), except for the processor 510, the memory 530, the interface 520, and the nonvolatile memory 540 shown in fig. 6, the electronic device where the apparatus 100 is located in the embodiment may also include other hardware generally according to the actual function of the electronic device, which is not described again.

The present invention also provides a machine-readable storage medium on which a program is stored, which when executed by a processor, implements the bandwidth distribution determining method or the program optimizing method as described in the foregoing embodiments.

The present invention may take the form of a computer program product embodied on one or more storage media including, but not limited to, disk storage, CD-ROM, optical storage, and the like, having program code embodied therein. Machine-readable storage media include both permanent and non-permanent, removable and non-removable media, and the storage of information may be accomplished by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of machine-readable storage media include, but are not limited to: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium may be used to store information that may be accessed by a computing device.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

28页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于DMA的光伏跟踪器数据采集方法与系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!