Network model quantification and inference method and device, electronic equipment and storage medium

文档序号:380292 发布日期:2021-12-10 浏览:27次 中文

阅读说明:本技术 网络模型量化、推理方法、装置、电子设备及存储介质 (Network model quantification and inference method and device, electronic equipment and storage medium ) 是由 孟泽民 于 2020-06-10 设计创作,主要内容包括:本申请实施例提供了一种网络模型量化、推理方法、装置、电子设备及存储介质,涉及深度学习技术领域。方法包括:获得待量化网络模型;在沿所述待量化网络模型的数据处理流向对所述待量化网络模型进行量化的过程中,判断当前处理对象是否为量化中间层;若为是,以量化中间层参数中输出量化方式标识所指示的输出量化方式,对沿所述数据处理流向的下一网络子图进行量化;若为否,以模型推理平台支持的量化方式对该当前处理对象进行量化。由此可见,应用本申请实施例提供的方案,能够提高对量化后的模型进行模型推理时的效率。(The embodiment of the application provides a network model quantification and reasoning method, a network model quantification and reasoning device, electronic equipment and a storage medium, and relates to the technical field of deep learning. The method comprises the following steps: obtaining a network model to be quantized; judging whether a current processing object is a quantization middle layer or not in the process of quantizing the network model to be quantized along the data processing flow direction of the network model to be quantized; if so, quantizing the next network subgraph along the data processing flow direction by using the output quantization mode indicated by the output quantization mode identification in the quantization middle layer parameter; if not, quantizing the current processing object in a quantization mode supported by the model inference platform. Therefore, by applying the scheme provided by the embodiment of the application, the efficiency of model reasoning on the quantized model can be improved.)

1. A method for quantifying a network model, the method comprising:

obtaining a network model to be quantized, wherein the network model to be quantized comprises a quantization intermediate layer, the quantization intermediate layer is respectively connected with an input end and an output end of a first type of network subgraph, and parameters of the quantization intermediate layer comprise: the quantization intermediate layer is used for: the input data of the input data type is dequantized by referring to the input quantization mode corresponding to the input quantization mode identifier to obtain dequantized data, the dequantized data is quantized according to the output quantization mode corresponding to the output quantization mode identifier, and the output data of the output data type is output, wherein the first-class network subgraph is as follows: a network subgraph which needs to be quantized in a third-party quantization mode;

judging whether a current processing object is a quantization middle layer or not in the process of quantizing the network model to be quantized along the data processing flow direction of the network model to be quantized;

if so, quantizing the next network subgraph along the data processing flow direction by using the output quantization mode indicated by the output quantization mode identification in the quantization middle layer parameter;

if not, quantizing the current processing object in a quantization mode supported by the model inference platform.

2. The method of claim 1, wherein obtaining the network model to be quantified comprises:

obtaining an original network model;

detecting a network subgraph which needs to be quantized in a third-party quantization mode in the original network model to serve as a first-class network subgraph;

and adding quantization middle layers at the input end and the output end of the detected first-class network subgraph respectively, and setting parameters of the quantization middle layers according to the quantization information of the network subgraph connected with the quantization middle layers aiming at each quantization middle layer to obtain the network model to be quantized.

3. The method of claim 1,

for each network subgraph of the first type, under the condition that the input end or the output end of the network subgraph of the first type is connected with a network subgraph of the second type, a quantization middle layer is positioned: the input end or the output end of the first class of network subgraph and the second class of network subgraph are connected, wherein the second class of network subgraph is as follows: a network subgraph which needs to be quantized by adopting a quantization mode supported by the model reasoning platform; and/or

For each first-class network subgraph, under the condition that the input end or the output end of the first-class network subgraph is connected with a plurality of second-class network subgraphs, a quantization middle layer is positioned: the input end or the output end of the first class network subgraph and the second class network subgraphs.

4. The method of claim 1,

the first type of network subgraph is as follows: a network subgraph which needs to be quantized by a quantization mode which is not supported by the model reasoning platform; and/or

The first type of network subgraph is as follows: a network subgraph containing a network layer which does not support quantization locally; and/or

The first type of network subgraph is as follows: and the network subgraph needs to be quantized in a third-party quantization mode specified by the user.

5. The method according to any one of claims 1-4, wherein the quantizing the next network subgraph along the data processing flow with the output quantization mode indicated by the output quantization mode identification in the quantization middle layer parameter comprises:

identifying an output quantization mode identifier included in the quantization intermediate layer parameter;

and searching the output quantization mode corresponding to the identified output quantization mode identification from the quantization mode configured by the user and the locally supported quantization mode, and quantizing the next network subgraph along the data processing flow direction according to the searched output quantization mode.

6. A method for network model inference, the method comprising:

obtaining a network model to be inferred, wherein the network model to be inferred is as follows: the network model to be quantized comprises a quantization intermediate layer, the quantization intermediate layer is respectively connected with the input end and the output end of the first type of network subgraph, and parameters of the quantization intermediate layer comprise: the quantization intermediate layer is used for indicating: and identifying a corresponding output quantization mode according to the output quantization mode, and quantizing the next network subgraph along the data processing flow direction of the network model to be quantized, wherein the first type of network subgraph is as follows: a network subgraph which needs to be quantized in a third-party quantization mode;

judging whether a current reasoning object is a quantitative middle layer or not in the process of reasoning along the data processing flow direction based on the data to be reasoned;

if so, carrying out inverse quantization on the input data of the input data type by referring to the input quantization mode corresponding to the input quantization mode identification to obtain inverse quantization data, then carrying out quantization on the inverse quantization data according to the output quantization mode corresponding to the output quantization mode identification, outputting the output data of the output data type, and reasoning the next network subgraph along the data processing flow direction based on the output data.

7. The method of claim 6, further comprising:

judging whether a current inference object is a first-class network subgraph or not in the process of reasoning along the data processing flow direction based on data to be inferred;

if so, reasoning the first-class network subgraph based on the data to be inferred according to reasoning information configured by the user.

8. An apparatus for network model quantization, the apparatus comprising:

the first model obtaining module is used for obtaining a network model to be quantized, wherein the network model to be quantized comprises a quantization intermediate layer, the quantization intermediate layer is respectively connected with an input end and an output end of a first type of network subgraph, and parameters of the quantization intermediate layer comprise: the quantization intermediate layer is used for: the input data of the input data type is dequantized by referring to the input quantization mode corresponding to the input quantization mode identifier to obtain dequantized data, the dequantized data is quantized according to the output quantization mode corresponding to the output quantization mode identifier, and the output data of the output data type is output, wherein the first-class network subgraph is as follows: a network subgraph which needs to be quantized in a third-party quantization mode;

the first model judging module is used for judging whether a current processing object is a quantization intermediate layer or not in the process of quantizing the network model to be quantized along the data processing flow direction of the network model to be quantized, if so, the first model quantizing module is triggered, and if not, the second model quantizing module is triggered;

the first model quantization module is used for quantizing the next network subgraph along the data processing flow direction by the output quantization mode indicated by the output quantization mode identification in the quantization middle layer parameter;

and the second model quantization module is used for quantizing the current processing object in a quantization mode supported by a model inference platform.

9. The apparatus of claim 8, wherein the first model obtaining module is specifically configured to:

obtaining an original network model;

detecting a network subgraph which needs to be quantized in a third-party quantization mode in the original network model to serve as a first-class network subgraph;

and adding quantization middle layers at the input end and the output end of the detected first-class network subgraph respectively, and setting parameters of the quantization middle layers according to the quantization information of the network subgraph connected with the quantization middle layers aiming at each quantization middle layer to obtain the network model to be quantized.

10. The apparatus of claim 8,

for each network subgraph of the first type, under the condition that the input end or the output end of the network subgraph of the first type is connected with a network subgraph of the second type, a quantization middle layer is positioned: the input end or the output end of the first class of network subgraph and the second class of network subgraph are connected, wherein the second class of network subgraph is as follows: a network subgraph which needs to be quantized by adopting a quantization mode supported by the model reasoning platform; and/or

For each first-class network subgraph, under the condition that the input end or the output end of the first-class network subgraph is connected with a plurality of second-class network subgraphs, a quantization middle layer is positioned: the input end or the output end of the first class network subgraph and the second class network subgraphs.

11. The apparatus of claim 8,

the first type of network subgraph is as follows: a network subgraph which needs to be quantized by a quantization mode which is not supported by the model reasoning platform; and/or

The first type of network subgraph is as follows: a network subgraph containing a network layer which does not support quantization locally; and/or

The first type of network subgraph is as follows: and the network subgraph needs to be quantized in a third-party quantization mode specified by the user.

12. The apparatus according to any of claims 8-11, wherein the first model quantization module is specifically configured to:

identifying an output quantization mode identifier included in the quantization intermediate layer parameter;

and searching the output quantization mode corresponding to the identified output quantization mode identification from the quantization mode configured by the user and the locally supported quantization mode, and quantizing the next network subgraph along the data processing flow direction according to the searched output quantization mode.

13. A network model inference apparatus, characterized in that the apparatus comprises:

a second model obtaining module, configured to obtain a network model to be inferred, where the network model to be inferred is: the network model to be quantized comprises a quantization intermediate layer, the quantization intermediate layer is respectively connected with the input end and the output end of the first type of network subgraph, and parameters of the quantization intermediate layer comprise: the quantization intermediate layer is used for indicating: and identifying a corresponding output quantization mode according to the output quantization mode, and quantizing the next network subgraph along the data processing flow direction of the network model to be quantized, wherein the first type of network subgraph is as follows: a network subgraph which needs to be quantized in a third-party quantization mode;

the second model judgment module is used for judging whether the current inference object is a quantitative middle layer or not in the process of reasoning along the data processing flow direction based on the data to be inferred, and if so, the data conversion module is triggered;

and the data conversion module is used for carrying out inverse quantization on the input data of the input data type by referring to the input quantization mode corresponding to the input quantization mode identification to obtain inverse quantization data, quantizing the inverse quantization data according to the output quantization mode corresponding to the output quantization mode identification, outputting the output data of the output data type, and reasoning the next network subgraph along the data processing flow direction based on the output data.

14. The apparatus according to claim 13, further comprising a third model determining module, specifically configured to:

judging whether a current inference object is a first-class network subgraph or not in the process of reasoning along the data processing flow direction based on data to be inferred;

if so, reasoning the first-class network subgraph based on the data to be inferred according to reasoning information configured by the user.

15. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1 to 5 when executing a program stored in the memory.

16. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 6 to 7 when executing a program stored in the memory.

17. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-5.

18. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 6-7.

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to a network model quantization and inference method, an apparatus, an electronic device, and a storage medium.

Background

With the rapid development of deep learning technology, the structure of the network model becomes more and more complex. When a network model is inferred on a model inference platform, the network model is generally required to be quantized in order to improve the inference efficiency of the model inference platform and save the computational resources.

When the network model is quantized, a user generally sets a quantization mode of each network subgraph in the network model, and then quantizes each network subgraph of the network model according to the set quantization mode, wherein the set quantization mode may be a quantization mode supported by a model inference platform or a third-party quantization mode. Due to different quantization modes, the data types supported by the network subgraphs of the quantized network model may be different. The network subgraph comprises network layers which are continuous in a network model and need to be quantized in the same quantization mode.

When the quantized network model is reasoned, data interaction can be involved among the network subgraphs. Because the data types supported by the model inference platform are fixed data types, and the data types supported by the quantized network subgraphs are different, in the prior art, when the network model is quantized in the manner, data output by the previous network subgraph needs to be sent to a data conversion unit in the model inference platform when the model inference is performed, the data conversion unit converts the data types of the data output by the previous network subgraph into the fixed data types based on the quantization coefficients, then performs type conversion on the converted data again to obtain the data types supported by the next network subgraph, and then sends the data after the data type conversion to the next network subgraph. For example, assuming that the bit width of output data of one quantized network subgraph is 4 bits, the bit width of data supported by a next network subgraph of the network layer after quantization is 8 bits, and the fixed data bit width is 32 bits, in order to realize data interaction between the two network layers, data with a bit width of 4 bits needs to be sent to a data conversion unit, the data is converted into data with a bit width of 32 bits, and then the data with a bit width of 32 bits is converted into data with a bit width of 8 bits and transmitted to the next network subgraph.

It can be seen that when the network model quantized by applying the prior art is used for model reasoning, the data conversion unit of the model reasoning platform is required to perform type conversion on data interacted between network subgraphs, and the data conversion unit is located outside the network model, so that the data interaction between the network subgraphs and the outside is involved when the type conversion is performed, and the interactive data needs to be converted into a format of a fixed data type, so that the interaction process is long in time consumption, and the efficiency of the model reasoning is low.

Disclosure of Invention

An object of the embodiments of the present application is to provide a network model quantification and inference method, a network model quantification and inference device, an electronic device, and a storage medium, so as to improve efficiency of model inference on a quantified model. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a network model quantization method, where the method includes:

obtaining a network model to be quantized, wherein the network model to be quantized comprises a quantization intermediate layer, the quantization intermediate layer is respectively connected with an input end and an output end of a first type of network subgraph, and parameters of the quantization intermediate layer comprise: the quantization intermediate layer is used for: the input data of the input data type is dequantized by referring to the input quantization mode corresponding to the input quantization mode identifier to obtain dequantized data, the dequantized data is quantized according to the output quantization mode corresponding to the output quantization mode identifier, and the output data of the output data type is output, wherein the first-class network subgraph is as follows: a network subgraph which needs to be quantized in a third-party quantization mode;

judging whether a current processing object is a quantization middle layer or not in the process of quantizing the network model to be quantized along the data processing flow direction of the network model to be quantized;

if so, quantizing the next network subgraph along the data processing flow direction by using the output quantization mode indicated by the output quantization mode identification in the quantization middle layer parameter;

if not, quantizing the current processing object in a quantization mode supported by the model inference platform.

In an embodiment of the present application, the obtaining a network model to be quantized includes:

obtaining an original network model;

detecting a network subgraph which needs to be quantized in a third-party quantization mode in the original network model to serve as a first-class network subgraph;

and adding quantization middle layers at the input end and the output end of the detected first-class network subgraph respectively, and setting parameters of the quantization middle layers according to the quantization information of the network subgraph connected with the quantization middle layers aiming at each quantization middle layer to obtain the network model to be quantized.

In an embodiment of the present application, for each first-class network subgraph, in a case where an input or an output of the first-class network subgraph is connected to a second-class network subgraph, a quantization middle layer is located: the input end or the output end of the first class of network subgraph and the second class of network subgraph are connected, wherein the second class of network subgraph is as follows: a network subgraph which needs to be quantized by adopting a quantization mode supported by the model reasoning platform; and/or

For each first-class network subgraph, under the condition that the input end or the output end of the first-class network subgraph is connected with a plurality of second-class network subgraphs, a quantization middle layer is positioned: the input end or the output end of the first class network subgraph and the second class network subgraphs.

In an embodiment of the present application, the first type of network subgraph is: a network subgraph which needs to be quantized by a quantization mode which is not supported by the model reasoning platform; and/or

The first type of network subgraph is as follows: a network subgraph containing a network layer which does not support quantization locally; and/or

The first type of network subgraph is as follows: and the network subgraph needs to be quantized in a third-party quantization mode specified by the user.

In an embodiment of the present application, the quantizing the next network subgraph along the data processing flow direction by using the output quantization mode indicated by the output quantization mode identifier in the quantization middle layer parameter includes:

identifying an output quantization mode identifier included in the quantization intermediate layer parameter;

and searching the output quantization mode corresponding to the identified output quantization mode identification from the quantization mode configured by the user and the locally supported quantization mode, and quantizing the next network subgraph along the data processing flow direction according to the searched output quantization mode.

In a second aspect, an embodiment of the present application provides a network model inference method, where the method includes:

obtaining a network model to be inferred, wherein the network model to be inferred is as follows: the network model to be quantized comprises a quantization intermediate layer, the quantization intermediate layer is respectively connected with the input end and the output end of the first type of network subgraph, and parameters of the quantization intermediate layer comprise: the quantization intermediate layer is used for indicating: and identifying a corresponding output quantization mode according to the output quantization mode, and quantizing the next network subgraph along the data processing flow direction of the network model to be quantized, wherein the first type of network subgraph is as follows: a network subgraph which needs to be quantized in a third-party quantization mode;

judging whether a current reasoning object is a quantitative middle layer or not in the process of reasoning along the data processing flow direction based on the data to be reasoned;

if so, carrying out inverse quantization on the input data of the input data type by referring to the input quantization mode corresponding to the input quantization mode identification to obtain inverse quantization data, then carrying out quantization on the inverse quantization data according to the output quantization mode corresponding to the output quantization mode identification, outputting the output data of the output data type, and reasoning the next network subgraph along the data processing flow direction based on the output data.

In one embodiment of the present application, the method further comprises:

judging whether a current inference object is a first-class network subgraph or not in the process of reasoning along the data processing flow direction based on data to be inferred;

if so, reasoning the first-class network subgraph based on the data to be inferred according to reasoning information configured by the user.

In a third aspect, an embodiment of the present application provides a network model quantizing device, where the device includes:

the first model obtaining module is used for obtaining a network model to be quantized, wherein the network model to be quantized comprises a quantization intermediate layer, the quantization intermediate layer is respectively connected with an input end and an output end of a first type of network subgraph, and parameters of the quantization intermediate layer comprise: the quantization intermediate layer is used for: the input data of the input data type is dequantized by referring to the input quantization mode corresponding to the input quantization mode identifier to obtain dequantized data, the dequantized data is quantized according to the output quantization mode corresponding to the output quantization mode identifier, and the output data of the output data type is output, wherein the first-class network subgraph is as follows: a network subgraph which needs to be quantized in a third-party quantization mode;

the first model judging module is used for judging whether a current processing object is a quantization intermediate layer or not in the process of quantizing the network model to be quantized along the data processing flow direction of the network model to be quantized, if so, the first model quantizing module is triggered, and if not, the second model quantizing module is triggered;

the first model quantization module is used for quantizing the next network subgraph along the data processing flow direction by the output quantization mode indicated by the output quantization mode identification in the quantization middle layer parameter;

and the second model quantization module is used for quantizing the current processing object in a quantization mode supported by a model inference platform.

In an embodiment of the application, the first model obtaining module is specifically configured to:

obtaining an original network model;

detecting a network subgraph which needs to be quantized in a third-party quantization mode in the original network model to serve as a first-class network subgraph;

and adding quantization middle layers at the input end and the output end of the detected first-class network subgraph respectively, and setting parameters of the quantization middle layers according to the quantization information of the network subgraph connected with the quantization middle layers aiming at each quantization middle layer to obtain the network model to be quantized.

In an embodiment of the present application, for each first-class network subgraph, in a case where an input or an output of the first-class network subgraph is connected to a second-class network subgraph, a quantization middle layer is located: the input end or the output end of the first class of network subgraph and the second class of network subgraph are connected, wherein the second class of network subgraph is as follows: a network subgraph which needs to be quantized by adopting a quantization mode supported by the model reasoning platform; and/or

For each first-class network subgraph, under the condition that the input end or the output end of the first-class network subgraph is connected with a plurality of second-class network subgraphs, a quantization middle layer is positioned: the input end or the output end of the first class network subgraph and the second class network subgraphs.

In an embodiment of the present application, the first type of network subgraph is: a network subgraph which needs to be quantized by a quantization mode which is not supported by the model reasoning platform; and/or

The first type of network subgraph is as follows: a network subgraph containing a network layer which does not support quantization locally; and/or

The first type of network subgraph is as follows: and the network subgraph needs to be quantized in a third-party quantization mode specified by the user.

In an embodiment of the application, the first model quantization module is specifically configured to:

identifying an output quantization mode identifier included in the quantization intermediate layer parameter;

and searching the output quantization mode corresponding to the identified output quantization mode identification from the quantization mode configured by the user and the locally supported quantization mode, and quantizing the next network subgraph along the data processing flow direction according to the searched output quantization mode.

In a fourth aspect, an embodiment of the present application provides a network model inference device, where the device includes:

a second model obtaining module, configured to obtain a network model to be inferred, where the network model to be inferred is: the network model to be quantized comprises a quantization intermediate layer, the quantization intermediate layer is respectively connected with the input end and the output end of the first type of network subgraph, and parameters of the quantization intermediate layer comprise: the quantization intermediate layer is used for indicating: and identifying a corresponding output quantization mode according to the output quantization mode, and quantizing the next network subgraph along the data processing flow direction of the network model to be quantized, wherein the first type of network subgraph is as follows: a network subgraph which needs to be quantized in a third-party quantization mode;

the second model judgment module is used for judging whether the current inference object is a quantitative middle layer or not in the process of reasoning along the data processing flow direction based on the data to be inferred, and if so, the data conversion module is triggered;

and the data conversion module is used for carrying out inverse quantization on the input data of the input data type by referring to the input quantization mode corresponding to the input quantization mode identification to obtain inverse quantization data, quantizing the inverse quantization data according to the output quantization mode corresponding to the output quantization mode identification, outputting the output data of the output data type, and reasoning the next network subgraph along the data processing flow direction based on the output data.

In an embodiment of the present application, the apparatus further includes a third model determining module, specifically configured to:

judging whether a current inference object is a first-class network subgraph or not in the process of reasoning along the data processing flow direction based on data to be inferred;

if so, reasoning the first-class network subgraph based on the data to be inferred according to reasoning information configured by the user.

In a fifth aspect, an embodiment of the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of the first aspect when executing a program stored in the memory.

In a sixth aspect, an embodiment of the present application provides an electronic device, which includes a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of the second aspect when executing the program stored in the memory.

In a seventh aspect, this application provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method steps of any one of the first aspect.

In an eighth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method steps of any one of the second aspects.

Embodiments of the present application further provide a computer program product containing instructions, which when run on a computer, cause the computer to perform any one of the above described network model quantification methods.

Embodiments of the present application further provide a computer program product containing instructions, which when run on a computer, cause the computer to perform any of the above described network model inference methods.

The embodiment of the application has the following beneficial effects:

when the scheme provided by the embodiment of the application is applied to quantizing the network model, firstly, a network model to be quantized is obtained, wherein the network model to be quantized comprises a quantizing intermediate layer, the quantizing intermediate layer is respectively connected with the input end and the output end of the first-class network subgraph, and parameters of the quantizing intermediate layer comprise: the quantization intermediate layer is used for: the input quantization mode corresponding to the input quantization mode identification is referred to perform inverse quantization on input data of the input data type to obtain inverse quantization data, the inverse quantization data is quantized according to the output quantization mode corresponding to the output quantization mode identification, and output data of the output data type is output, wherein the first-class network subgraph is as follows: a network subgraph which needs to be quantized in a third-party quantization mode; judging whether a current processing object is a quantization intermediate layer or not in the process of quantizing the network model to be quantized along the data processing flow direction of the network model to be quantized; if so, quantizing the next network subgraph along the data processing flow direction by using the output quantization mode indicated by the output quantization mode identification in the quantization middle layer parameter; if not, quantizing the current processing object in a quantization mode supported by the model inference platform. The quantized network model comprises the quantization intermediate layer, when model reasoning is carried out, the quantization intermediate layer in the network model can carry out type conversion on interactive data between network subgraphs adopting different quantization modes, data interaction between the network subgraphs and units outside the network model is not needed, and the interactive data is not needed to be converted into data of a fixed data type, so that time consumption brought by type conversion on the interactive data is reduced. Therefore, by applying the network model quantization scheme provided by the embodiment of the application, the efficiency of model reasoning on the quantized model can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a network model quantization method according to an embodiment of the present disclosure;

FIGS. 2a and 2b are schematic views of the position of a first quantization middle layer provided in an embodiment of the present application;

FIGS. 3a and 3b are schematic views of the position of a second quantization layer provided in an embodiment of the present application;

fig. 4a and 4b are schematic position diagrams of a third quantization middle layer provided in an embodiment of the present application;

fig. 5 is a schematic flowchart of a network model inference method according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a network model quantization apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a network model inference device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to improve the efficiency of model reasoning on a quantized model, the embodiment of the application provides a network model quantization method, a network model quantization device, electronic equipment and a storage medium. Correspondingly, the embodiment of the application also provides a network model reasoning method, a network model reasoning device, electronic equipment and a storage medium, which are respectively described in detail below.

Referring to fig. 1, fig. 1 is a schematic flow chart of a network model quantification method provided in the present application, where the method includes the following steps 101 to 104.

Step 101, obtaining a network model to be quantized.

The network model to be quantized may be a deep neural network model, a cyclic neural network model, a convolutional neural network model, or the like.

The network model to be quantized comprises a quantization intermediate layer. The quantization intermediate layer is a preset special network layer with a quantization function, and when a network subgraph of a third-party quantization scheme adopted by a user appears in a network, the quantization intermediate layer has a quantization transition function. Quantifying parameters of the intermediate layer includes: the device comprises an input quantization mode identifier, an output quantization mode identifier, an input data type and an output data type. The quantization intermediate layer may be configured to perform inverse quantization on input data of the input data type with reference to the input quantization mode corresponding to the input quantization mode identifier to obtain inverse quantization data, quantize the inverse quantization data according to the output quantization mode corresponding to the output quantization mode identifier, and output data of the output data type.

The quantization method may be a linear quantization method, a bit quantization method, a weight activated quantization method, or the like. The quantization mode identifier may be a preset number, an english letter, or the like. The input quantization mode flag is: and outputting a quantization mode identifier for indicating that the quantization mode identifier is required to be adopted when the previous network subgraph along the data processing flow direction is quantized: and identifying the corresponding quantization mode according to the quantization mode, and quantizing the next network subgraph along the data processing flow direction. The input quantization mode and the output quantization mode may be a quantization mode configured by a user or a quantization mode supported locally. It should be noted that the quantization mode flag may also indicate: the next network subgraph is not quantized.

The input data type means: the type of input data supported by the quantization intermediate layer may be, for example, integer, long integer, single precision floating point, or the like. The bit width of the input data supported by the quantization intermediate layer may be determined according to the input data type, for example, in the case that the input data type is integer, the bit width of the data supported by the quantization intermediate layer is 8 bits.

The output data type means: the quantization intermediate layer may process input data and output the processed input data in the form of, for example, integer, long integer, single-precision floating point, or the like. The bit width of the output data output by the quantization intermediate layer can be determined according to the type of the output data.

Specifically, the quantization intermediate layer may receive input data that is output by the previous network sub-graph and satisfies the type of the input data, and perform inverse quantization on the input data according to the input quantization mode indicated by the input quantization mode identifier, thereby obtaining inverse quantization data. And then, according to the output quantization mode identification corresponding to the output quantization mode, quantizing the inverse quantization data, outputting the output data meeting the output data type, and sending the output data to the next network subgraph. Therefore, when the model is reasoned, the quantization middle layer can perform inverse quantization processing on input data and quantize the inverse quantization data according to an output quantization mode, interactive data does not need to be converted into data of a fixed data type, data interaction operation can be reduced, and consumption of operation resources is further reduced. And the next network subgraph can directly process the output data, and the data processing efficiency is improved.

Besides, the parameters for quantizing the intermediate layer may further include: the data arrangement mode, the memory information and the like of the supported input data, and the data arrangement mode, the memory information and the like of the output data output after the input data is processed by the quantization intermediate layer. The data arrangement mode may be centralized distribution, independent distribution, etc. The memory information may characterize the size of the memory occupied by the data. The parameters for quantizing the intermediate layer may further include an input quantization coefficient, and the input quantization coefficient may be characterized by: the quantization coefficients adopted by the quantization mode are input, and inverse quantization of the input data can be realized according to the input quantization coefficients. Therefore, the quantization middle layer can convert the interactive data between the first-class network subgraph and the second-class network subgraph which are connected, the conversion of the quantization mode, the data type, the data arrangement mode and the like of the interactive data is unified, the data conversion unit of the model reasoning platform is not needed to convert the interactive data, the resource consumption caused by the conversion of the interactive data by the data conversion unit can be reduced, and the data conversion efficiency is improved.

The quantization middle layer is respectively connected with the input end and the output end of the first-class network subgraph, namely, the quantization middle layer is respectively positioned at the input end side and the output end side of the first-class network subgraph, the quantization middle layer positioned at the input end side of the first-class network subgraph is connected with the input end of the first-class network subgraph, and the quantization middle layer positioned at the output end side of the first-class network subgraph is connected with the output end of the first-class network subgraph. The quantization middle layer is added into the network model in a network layer form, and can be used for converting interactive data between the first-class network subgraph and the second-class network subgraph, indicating a corresponding quantization mode of the connected first-class network subgraph, not performing additional operation on the interactive data, and not changing the data processing logic of the network model, so that the accuracy of model reasoning is not influenced when the network model is subjected to reasoning.

Wherein, the first kind of network subgraph is: each network subgraph needing to be quantized in a third-party quantization mode comprises network layers which are continuous in the network model and need to be quantized in the same quantization mode. The third-party quantization mode refers to a quantization mode that is not supported by the model inference platform, and may be, for example, a user-defined quantization mode, or a quantization mode provided by the third-party platform. The quantization mode supported by the model inference platform is the same as the local existing quantization mode when the model quantization is performed, so the quantization mode supported by the model inference platform can also be understood as the local quantization mode. The network layer included in the first type of network subgraph is usually a user-defined private network layer, the private network layer is usually a new type of network layer obtained by combining basic algorithms, and the quantization mode of the private network layer can be configured by a user. Corresponding to the first class of network subgraphs, the second class of network subgraphs is: a network subgraph quantized in a quantization mode supported by a model inference platform is needed. The network layer included in the second kind of network subgraphs is usually a support network layer supported by the model inference platform to perform inference, such as a convolutional layer, a full-link layer, and the like.

For example, if the network model includes 5 network layers, where a first network layer to a fifth network layer are sequentially connected, the first network layer and the second network layer need to be quantized in a local quantization mode a1, the third network layer and the fourth network layer need to be quantized in a third quantization mode B1, and the fifth network layer needs to be quantized in a local quantization mode a2, the first network layer and the second network layer may be used as a second-type network subgraph, the third network layer and the fourth network layer as a first-type network subgraph, and the fifth network layer as another second-type network subgraph.

Referring to fig. 2a and 2b, fig. 2a and 2b are schematic views of a position of a first quantization interlayer provided in an embodiment of the present application. Fig. 2a is a schematic structural diagram of a first original network model provided in an embodiment of the present application. The end where the mark 'In' is located In each network subgraph is an input end, and the end where the mark 'Out' is located is an output end. It can be seen that the original network model includes two second-class network subgraphs and a first-class network subgraph, and the first-class network subgraph is located between the two second-class network subgraphs, so that a quantization middle layer needs to be added at the input end and the output end of the first-class network subgraph respectively, and the network model structure with the quantization middle layer added as shown in fig. 2b can be obtained.

When the next network subgraph of the quantization middle layer is the first-class network subgraph, the quantization mode identification carried in the quantization middle layer parameter is the identification of the third-party quantization mode; and when the next network subgraph of the quantization middle layer is the second-class network subgraph, the quantization mode identifier carried in the quantization middle layer parameter is the identifier of the local quantization mode.

In one embodiment of the present application, when the network model is generated, a network middle layer is directly added to the network model. A user can add a quantization intermediate layer in the network model by modifying the model file. And a quantization intermediate layer can be respectively added at the input end and the output end of the first type of network subgraph by using a transfer layer adding tool.

And 102, judging whether the current processing object is a quantization intermediate layer or not in the process of quantizing the network model to be quantized along the data processing flow direction of the network model to be quantized, if so, executing a step 103, and if not, executing a step 104.

The current processing object may be a quantization middle layer, or may be a first-class network subgraph or a second-class network subgraph. The data processing flow direction refers to: the network model processes the data in the direction indicated by the order in which the data is processed.

Specifically, when the network model to be quantized is quantized, each network subgraph of the network model to be quantized can be quantized sequentially according to the sequence of the data processing flow direction. In the quantization process, the current processing object may be judged.

In one embodiment of the present application, when a quantization middle layer is added, the added quantization middle layer may be marked in advance, so that when a network layer with the above mark is detected during model quantization, the network layer may be used as the quantization middle layer.

In another embodiment of the present application, a reference position of the quantization intermediate layer in the network model to be quantized may be obtained in advance, so that whether the position of the current processing object belongs to the reference position may be determined, and if so, the current processing object may be used as the quantization intermediate layer.

And 103, quantizing the next network subgraph along the data processing flow direction by using the output quantization mode indicated by the output quantization mode identification in the quantization middle layer parameter.

Specifically, if it is detected that the current processing object is a quantization middle layer, an output quantization mode identifier carried in a parameter of the quantization middle layer may be identified, and then an output quantization mode corresponding to the identified identifier is selected to quantize a next network subgraph along the data processing flow direction.

When the next network subgraph of the quantization middle layer is the first-class network subgraph, the output quantization mode identifier carried in the quantization middle layer parameter is the identifier of the third-party quantization mode, so that the next network subgraph can be quantized by the third-party quantization mode indicated by the quantization mode identifier;

when the next network subgraph of the quantization middle layer is the second-class network subgraph, the output quantization mode identifier carried in the quantization middle layer parameter is the identifier of the local quantization mode, so that the next network subgraph can be quantized by adopting the local quantization mode indicated by the quantization mode identifier.

And 104, quantizing the current processing object in a quantization mode supported by the model inference platform.

Specifically, a quantization mode specified by a user, supported by the model inference platform, and used for quantizing the second type of network subgraph can be obtained in advance. If it is detected that the current processing object is not the quantization intermediate layer, the current processing object may be quantized according to the preset quantization mode. Because the quantization mode adopted by the second type of network subgraph can be a local existing quantization mode, when the current processing object is quantized, a preset quantization mode can be directly obtained from the local to quantize the current processing object.

When the scheme provided by the embodiment is applied to quantizing the network model, firstly, a network model to be quantized is obtained, wherein the network model to be quantized comprises a quantization intermediate layer, the quantization intermediate layer is respectively connected with the input end and the output end of the first-class network subgraph, and parameters of the quantization intermediate layer comprise: the quantization intermediate layer is used for: the input quantization mode corresponding to the input quantization mode identification is referred to perform inverse quantization on input data of the input data type to obtain inverse quantization data, the inverse quantization data is quantized according to the output quantization mode corresponding to the output quantization mode identification, and output data of the output data type is output, wherein the first-class network subgraph is as follows: a network subgraph which needs to be quantized in a third-party quantization mode; judging whether a current processing object is a quantization intermediate layer or not in the process of quantizing the network model to be quantized along the data processing flow direction of the network model to be quantized; if so, quantizing the next network subgraph along the data processing flow direction by using the output quantization mode indicated by the output quantization mode identification in the quantization middle layer parameter; if not, quantizing the current processing object in a quantization mode supported by the model inference platform. The quantized network model comprises the quantization intermediate layer, when model reasoning is carried out, the quantization intermediate layer in the network model can carry out type conversion on interactive data between network subgraphs adopting different quantization modes, data interaction between the network subgraphs and units outside the network model is not needed, and the interactive data is not needed to be converted into data of a fixed data type, so that time consumption brought by type conversion on the interactive data is reduced. Therefore, by applying the network model quantization scheme provided by the embodiment, the efficiency of model reasoning on the quantized model can be improved.

In an embodiment of the present application, for step 101, when obtaining a network model to be quantized, an original network model may be obtained first, a network subgraph in the original network model that needs to be quantized in a third-party quantization manner is detected, the network subgraph is used as a first-class network subgraph, quantization middle layers are added at an input end and an output end of the detected first-class network subgraph, and for each quantization middle layer, parameters of the quantization middle layer are set according to quantization information of the network subgraph connected to the quantization middle layer, so as to obtain the network model to be quantized.

The quantization information may include a quantization mode adopted by a network subgraph connected to the quantization intermediate layer, a type of input or output data, and the like.

Specifically, an original network model may be obtained first, where the original network model is a network model that does not include a quantization middle layer. For the original network model, a first-class network subgraph contained in the model can be detected, and quantization middle layers are added at the input end and the output end of the detected first-class network subgraph respectively. And then setting parameters of the quantization intermediate layer according to the quantization information of the network subgraph connected with the added quantization intermediate layer. Specifically, the input quantization mode identifier in the quantization intermediate layer parameter may be determined according to the quantization mode adopted by the last network subgraph connected to the quantization intermediate layer along the data processing stream, and the output quantization mode identifier in the quantization intermediate layer parameter may be determined according to the quantization mode adopted by the next network subgraph connected to the quantization intermediate layer. And determining the input data type in the quantization intermediate layer parameters according to the type of the data output by the last network subgraph, and determining the output data type in the quantization intermediate layer parameters according to the type of the data accepted by the next network subgraph.

In an embodiment of the present application, the position of the quantization middle layer in the network model to be quantized may include the following two cases.

In case one, for each network subgraph of the first type, in case that the input or output of the network subgraph of the first type is connected with a network subgraph of the second type, a quantization middle layer is located: the input end or the output end of the first type network subgraph and a second type network subgraph.

Specifically, the quantization intermediate layer may be configured to convert input data of an input data type into output data of an output data type, and a third-party quantization manner adopted by each first-class network subgraph may be different, so that, when an input end or an output end of a first-class network subgraph is connected to a second-class network subgraph, a quantization intermediate layer needs to be added between the input end or the output end of each first-class network subgraph and the connected second-class network subgraph, which is convenient for implementing data interaction between each first-class network subgraph and the connected second-class network subgraph.

Referring to fig. 3a and 3b, fig. 3a and 3b are schematic views of the position of a second quantization interlayer provided in an embodiment of the present application. Fig. 3a is a schematic structural diagram of a second original network model provided in this embodiment. It can be seen that the original network model includes four second-class network subgraphs and two first-class network subgraphs, wherein an output end of one second-class network subgraph is connected with input ends of the two first-class network subgraphs and the second-class network subgraph respectively, and output ends of the two first-class network subgraphs are connected with input ends of the two second-class network subgraphs respectively. For the two first-class network subgraphs, a quantization middle layer needs to be added at the input end and the output end of the two first-class network subgraphs, so that the network model structure shown in fig. 3b after the quantization middle layer is added can be obtained.

In case two, for each first-class network subgraph, in case that the input or output of the first-class network subgraph is connected with a plurality of second-class network subgraphs, a quantization middle layer is located: the input end or the output end of the first type network subgraph and a plurality of second type network subgraphs.

Specifically, the quantization modes adopted by the second-class network subgraphs are the same, and the data types supported by the quantized second-class network subgraphs are also the same, so that under the condition that the input end or the output end of one first-class network subgraph is connected with a plurality of second-class network subgraphs, the data input into the first-class network subgraph by each second-class network subgraph can be converted by one quantization middle layer, the data output by the first-class network subgraph is converted by one quantization middle layer, and the converted data is input into each second-class network subgraph.

Referring to fig. 4a and 4b, fig. 4a and 4b are schematic position diagrams of a third quantization middle layer provided in an embodiment of the present application. Fig. 4a is a schematic structural diagram of a third original network model provided in this embodiment of the present application. It can be seen that the original network model includes four second-class network subgraphs and two first-class network subgraphs, wherein an input end of one first-class network subgraph is connected with one second-class network subgraph, an output end of the first-class network subgraph is respectively connected with the two second-class network subgraphs and an input end of one first-class network subgraph, and an output end of the other first-class network subgraph is connected with an input end of one second-class network subgraph. According to the second scheme, a quantization middle layer is added to the output end of one first-class network subgraph and the input ends of two second-class network subgraphs, a quantization middle layer is added to the input end of one first-class network subgraph, and a quantization middle layer is added to the output end of the other first-class network subgraph, so that the network model structure after the quantization middle layer is added, which is shown in fig. 4b, can be obtained.

In an embodiment of the application, when a first-class network subgraph is determined, an original network model topology can be regarded as a directed acyclic graph, and a directed acyclic graph formed by continuous network layers which need to be quantized by the same third-party quantization scheme is selected from the obtained directed acyclic graph to serve as the first-class network subgraph.

In an embodiment of the present application, the first type of network subgraph may be: a network subgraph quantized in a quantization mode not supported by a model inference platform is needed.

Specifically, the quantization modes supported by the model inference platform can be obtained in advance, when the first-class network subgraph is determined, the quantization modes required by each network subgraph in the original network model can be sequentially identified, and when the quantization modes required by the network subgraph are identified to be not the quantization modes obtained in advance, the quantization modes required by the network subgraph are described to be the quantization modes not supported by the model inference platform, so that the network subgraph can be used as the first-class network subgraph.

Besides, the first type of network subgraph may also be: a network subgraph quantized in a locally unsupported quantization mode is needed.

Specifically, a quantization mode supported locally may be obtained in advance, when a first-class network subgraph is determined, a quantization mode required by each network subgraph in the original network model may be sequentially identified, and when it is identified that the quantization mode required by the network subgraph does not belong to the quantization mode obtained in advance, it is described that the quantization mode required by the network subgraph is a quantization mode not supported locally, so that the network subgraph may be taken as the first-class network subgraph.

In another embodiment of the present application, the first type of network subgraph may be: a network subgraph containing a network layer that does not locally support quantization.

Specifically, network layers that locally support quantization, such as convolutional layers, pooling layers, active layers, and the like, may be obtained in advance. If the network layers included in the network subgraph are all network layers which support quantization locally, the network subgraph can be used as a second type of network subgraph, and if the network layers included in the network subgraph have network layers which do not support quantization locally, the network subgraph can be used as a first type of network subgraph.

In another embodiment of the present application, the first-type network subgraph may further be: and the network subgraph needs to be quantized in a third-party quantization mode specified by the user. Specifically, a network subgraph which is specified by a user and needs to adopt a third-party quantization mode can be directly obtained and used as the first-class network subgraph.

In an embodiment of the present application, for step 103, when quantizing the next network subgraph, the output quantization mode identifier included in the quantization middle layer parameter may be identified, an output quantization mode corresponding to the identified output quantization mode identifier is searched from the quantization mode configured by the user and the locally supported quantization mode, and the next network subgraph along the data processing flow direction is quantized according to the searched output quantization mode.

Specifically, because a quantization mode that is not locally supported exists in quantization modes required by the network model to be quantized, a quantization mode configured by a user needs to be obtained in advance, so that when a next network subgraph of the quantization middle layer is quantized along the data processing flow, an output quantization mode corresponding to an output quantization mode identifier can be searched from the quantization mode configured by the user and the quantization mode existing locally according to the output quantization mode identifier carried in the quantization middle layer parameter, and the next network subgraph is quantized according to the searched output quantization mode.

If the output quantization mode specified by the output quantization mode identifier is a third-party quantization mode, searching for the output quantization mode corresponding to the output quantization mode identifier from the quantization modes configured by the user; if the output quantization mode specified by the output quantization mode identifier is a locally supported quantization mode, the output quantization mode corresponding to the output quantization mode identifier can be searched from the local quantization modes.

In an embodiment of the application, an original network model can be obtained in advance, a quantization intermediate layer is added at the output end and the output end of a first-class network subgraph of the original network model, and then model quantization is performed on the original network model added with the quantization intermediate layer. And in the process of quantizing the original network model, when a first-class network subgraph needing to be quantized in a third-party quantization mode is detected, the quantization middle layer is added to the input end and the output end of the first-class network subgraph.

In an embodiment of the present application, the parameter of the quantization middle layer may be set directly according to experience of a user, or may be set by identifying quantization information between network subgraphs connected by the quantization middle layer.

When the scheme provided by the embodiment is applied to quantizing the network model, firstly, a network model to be quantized is obtained, wherein the network model to be quantized comprises a quantization intermediate layer, the quantization intermediate layer is respectively connected with the input end and the output end of the first-class network subgraph, and parameters of the quantization intermediate layer comprise: the quantization intermediate layer is used for: the input quantization mode corresponding to the input quantization mode identification is referred to perform inverse quantization on input data of the input data type to obtain inverse quantization data, the inverse quantization data is quantized according to the output quantization mode corresponding to the output quantization mode identification, and output data of the output data type is output, wherein the first-class network subgraph is as follows: a network subgraph which needs to be quantized in a third-party quantization mode; judging whether a current processing object is a quantization intermediate layer or not in the process of quantizing the network model to be quantized along the data processing flow direction of the network model to be quantized; if so, quantizing the next network subgraph along the data processing flow direction by using the output quantization mode indicated by the output quantization mode identification in the quantization middle layer parameter; if not, quantizing the current processing object in a quantization mode supported by the model inference platform. The quantized network model comprises the quantization intermediate layer, when model reasoning is carried out, the quantization intermediate layer in the network model can carry out type conversion on interactive data between network subgraphs adopting different quantization modes, data interaction between the network subgraphs and units outside the network model is not needed, and the interactive data is not needed to be converted into data of a fixed data type, so that time consumption brought by type conversion on the interactive data is reduced. Therefore, by applying the network model quantization scheme provided by the embodiment, the efficiency of model reasoning on the quantized model can be improved.

Corresponding to the above network model quantification method, an embodiment of the present application further provides a network model inference method, referring to fig. 5, where fig. 5 is a schematic flow diagram of the network model inference method provided in the embodiment of the present application, and the method includes the following steps 501 to 503.

Step 501, obtaining a network model to be inferred.

The network model to be inferred is as follows: the network model to be quantized comprises a quantization intermediate layer, the quantization intermediate layer is respectively connected with the input end and the output end of the first type of network subgraph, and parameters of the quantization intermediate layer comprise: the quantization intermediate layer is used for indicating: identifying a corresponding output quantization mode according to the output quantization mode, and quantizing the next network subgraph along the data processing flow direction of the network model to be quantized, wherein the first type of network subgraph is as follows: a network subgraph needs to be quantized in a third-party quantization mode.

Specifically, the to-be-inferred model is a network model obtained by quantization using the model quantization scheme, and the structure of the to-be-inferred network model, the quantization mode of the to-be-inferred network model, and the like are not described herein again.

Step 502, in the process of reasoning along the data processing flow direction based on the data to be reasoned, judging whether the current reasoning object is a quantization middle layer, if so, executing step 503.

The current inference object can be a quantization middle layer, or a first-class network subgraph or a second-class network subgraph.

Specifically, the data to be inferred can be obtained in advance, and then the inference can be performed on the network model to be inferred based on the obtained data to be inferred along the data processing flow direction. In the inference process, if it is detected that the current inference object is a quantization middle layer, step 503 may be executed. When the network model to be inferred is inferred, the network model to be inferred can be inferred by utilizing an inference framework in the model inference platform.

If the current inference object is not the quantization middle layer, the current network subgraph can be inferred based on the obtained data to be inferred according to a preset inference mode.

In an embodiment of the present application, in the case that the added quantization middle layer is marked in advance, in the model inference process, when the network layer with the above mark is detected, it may be considered that the quantization middle layer is detected.

Step 503, inverse quantization is performed on the input data of the input data type with reference to the input quantization mode corresponding to the input quantization mode identifier to obtain inverse quantization data, then the inverse quantization data is quantized according to the output quantization mode corresponding to the output quantization mode identifier, output data of the output data type is output, and inference is performed on the next network subgraph along the data processing flow direction based on the output data.

Specifically, during model reasoning, the quantization intermediate layer may receive input data that is output by the previous network subgraph and satisfies the type of the input data, and perform inverse quantization on the input data according to the input quantization mode indicated by the input quantization mode identifier, thereby obtaining inverse quantization data. And then, according to the output quantization mode identification corresponding to the output quantization mode, quantizing the inverse quantization data, outputting the output data meeting the output data type, and sending the output data to the next network subgraph. The quantization intermediate layer can perform inverse quantization processing on input data, and then quantizes the inverse quantization data according to an output quantization mode, so that the obtained output data has higher accuracy, and the next network subgraph can directly process the output data, thereby improving the data processing efficiency. When the quantization intermediate layer performs inverse quantization on the input data, the input quantization mode adopted when the input data is quantized can be determined according to the input quantization mode identifier, so that the quantization coefficient adopted when the input data is quantized is determined, and the input data is inversely quantized according to the determined quantization coefficient.

When the network model inference scheme provided by the embodiment is applied to model inference, the quantization middle layer in the network model can perform type conversion on interactive data between network subgraphs adopting different quantization modes, the network subgraphs do not need to perform data interaction with units outside the network model, and the interactive data do not need to be converted into data of a fixed data type, so that the time consumption caused by the type conversion of the interactive data is reduced. Therefore, by applying the network model quantization scheme provided by the embodiment, the efficiency of model reasoning on the quantized model can be improved.

In an embodiment of the present application, for step 502, when performing model inference, a quantization mode adopted by each network subgraph of the to-be-inferred network model may be identified, a quantization mode supported by the model inference platform is selected from the identified quantization modes, the to-be-inferred data is quantized according to the selected quantization mode, and then, in a process of performing inference along a data processing flow direction based on the quantized to-be-inferred data, whether a current inference object is a quantization middle layer is determined.

Specifically, the network model to be inferred is a quantized network model, and the quantized data to be inferred can be input into the network model to be inferred by identifying the quantization mode adopted by the network model to be inferred and quantizing the data to be inferred according to the identified mode, so that the inference of the network model to be inferred is realized.

In an embodiment of the application, when the original network model is quantized, the quantization modes adopted by the network subgraphs can be marked on the network subgraphs, so that when the quantization modes adopted by the network subgraphs of the network model to be inferred are identified, the quantization modes adopted by the network subgraphs can be determined by identifying the marks carried by the network subgraphs.

In an embodiment of the application, in the process of reasoning along the data processing flow direction based on the data to be reasoned, whether the current reasoning object is the first-class network subgraph can be judged, and if so, reasoning is performed on the first-class network subgraph based on the data to be reasoned according to the reasoning information configured by the user.

Specifically, when the network model to be inferred is subjected to network quantization, a quantization mode which is not supported by the model inference platform exists in the adopted quantization modes, so that inference information configured by a user needs to be obtained in advance, and the inference information comprises formulas, functions and the like which are needed when a first-class network subgraph is inferred. Therefore, when the first-class network subgraph is reasoned, the first-class network subgraph can be reasoned by referring to the inference information.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a network model quantization apparatus provided in an embodiment of the present application, where the apparatus includes:

a first model obtaining module 601, configured to obtain a network model to be quantized, where the network model to be quantized includes a quantization intermediate layer, the quantization intermediate layer is connected to an input end and an output end of a first class of network subgraphs, and parameters of the quantization intermediate layer include: the quantization intermediate layer is used for: the input data of the input data type is dequantized by referring to the input quantization mode corresponding to the input quantization mode identifier to obtain dequantized data, the dequantized data is quantized according to the output quantization mode corresponding to the output quantization mode identifier, and the output data of the output data type is output, wherein the first-class network subgraph is as follows: a network subgraph which needs to be quantized in a third-party quantization mode;

a first model determining module 602, configured to determine, in a process of quantizing the network model to be quantized along a data processing flow direction of the network model to be quantized, whether a current processing object is a quantization middle layer, if so, trigger a first model quantizing module, and if not, trigger a second model quantizing module;

the first model quantization module 603 is configured to quantize a next network subgraph along the data processing flow direction in an output quantization manner indicated by the output quantization manner identifier in the quantization intermediate layer parameter;

the second model quantization module 604 is configured to quantize the current processed object in a quantization manner supported by a model inference platform.

In an embodiment of the application, the first model obtaining module 601 is specifically configured to:

obtaining an original network model;

detecting a network subgraph which needs to be quantized in a third-party quantization mode in the original network model to serve as a first-class network subgraph;

and adding quantization middle layers at the input end and the output end of the detected first-class network subgraph respectively, and setting parameters of the quantization middle layers according to the quantization information of the network subgraph connected with the quantization middle layers aiming at each quantization middle layer to obtain the network model to be quantized.

In one embodiment of the present application,

for each network subgraph of the first type, under the condition that the input end or the output end of the network subgraph of the first type is connected with a network subgraph of the second type, a quantization middle layer is positioned: the input end or the output end of the first class of network subgraph and the second class of network subgraph are connected, wherein the second class of network subgraph is as follows: a network subgraph which needs to be quantized by adopting a quantization mode supported by the model reasoning platform; and/or

For each first-class network subgraph, under the condition that the input end or the output end of the first-class network subgraph is connected with a plurality of second-class network subgraphs, a quantization middle layer is positioned: the input end or the output end of the first class network subgraph and the second class network subgraphs.

In one embodiment of the present application,

the first type of network subgraph is as follows: a network subgraph which needs to be quantized by a quantization mode which is not supported by the model reasoning platform; and/or

The first type of network subgraph is as follows: a network subgraph containing a network layer which does not support quantization locally; and/or

The first type of network subgraph is as follows: and the network subgraph needs to be quantized in a third-party quantization mode specified by the user.

In an embodiment of the present application, the first model quantization module 603 is specifically configured to:

identifying an output quantization mode identifier included in the quantization intermediate layer parameter;

and searching the output quantization mode corresponding to the identified output quantization mode identification from the quantization mode configured by the user and the locally supported quantization mode, and quantizing the next network subgraph along the data processing flow direction according to the searched output quantization mode.

When the scheme provided by the embodiment is applied to quantizing the network model, the quantized network model comprises the quantization intermediate layer, and when model reasoning is carried out, the quantization intermediate layer in the network model can carry out type conversion on interactive data between network subgraphs adopting different quantization modes, the network subgraphs do not need to carry out data interaction with units outside the network model, and the interactive data do not need to be converted into data of a fixed data type, so that the time consumption brought by the type conversion of the interactive data is reduced. Therefore, by applying the network model quantization scheme provided by the embodiment, the efficiency of model reasoning on the quantized model can be improved.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a network model inference device provided in an embodiment of the present application, where the device includes:

a second model obtaining module 701, configured to obtain a network model to be inferred, where the network model to be inferred is: the network model to be quantized comprises a quantization intermediate layer, the quantization intermediate layer is respectively connected with the input end and the output end of the first type of network subgraph, and parameters of the quantization intermediate layer comprise: the quantization intermediate layer is used for indicating: and identifying a corresponding output quantization mode according to the output quantization mode, and quantizing the next network subgraph along the data processing flow direction of the network model to be quantized, wherein the first type of network subgraph is as follows: a network subgraph which needs to be quantized in a third-party quantization mode;

a second model judgment module 702, configured to judge whether a current inference object is a quantization middle layer in the process of performing inference along the data processing flow direction based on data to be inferred, and if so, trigger the data conversion module;

the data conversion module 703 is configured to perform inverse quantization on the input data of the input data type with reference to the input quantization mode corresponding to the input quantization mode identifier to obtain inverse quantization data, quantize the inverse quantization data according to the output quantization mode corresponding to the output quantization mode identifier, output the output data of the output data type, and perform inference on a next network subgraph along the data processing flow direction based on the output data.

In an embodiment of the present application, the apparatus further includes a third model determining module, specifically configured to:

judging whether a current inference object is a first-class network subgraph or not in the process of reasoning along the data processing flow direction based on data to be inferred;

if so, reasoning the first-class network subgraph based on the data to be inferred according to reasoning information configured by the user.

When the network model inference scheme provided by the embodiment is applied to model inference, the quantization middle layer in the network model can perform type conversion on interactive data between network subgraphs adopting different quantization modes, the network subgraphs do not need to perform data interaction with units outside the network model, and the interactive data do not need to be converted into data of a fixed data type, so that the time consumption caused by the type conversion of the interactive data is reduced. Therefore, by applying the network model quantization scheme provided by the embodiment, the efficiency of model reasoning on the quantized model can be improved.

The embodiment of the present application further provides an electronic device, as shown in fig. 8, which includes a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete mutual communication through the communication bus 804,

a memory 803 for storing a computer program;

the processor 801 is configured to implement the network model quantization method described above when executing the program stored in the memory 803.

The embodiment of the application also provides another electronic device, which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;

a memory for storing a computer program; and the processor is used for realizing the steps of the network model reasoning method when executing the program stored in the memory.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In yet another embodiment provided by the present application, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the above network model quantification and inference methods.

In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the network model quantification and inference methods described in the above embodiments.

When the scheme provided by the embodiment is applied to quantizing the network model, firstly, a network model to be quantized is obtained, wherein the network model to be quantized comprises a quantization intermediate layer, the quantization intermediate layer is respectively connected with the input end and the output end of the first-class network subgraph, and parameters of the quantization intermediate layer comprise: the quantization intermediate layer is used for: the input quantization mode corresponding to the input quantization mode identification is referred to perform inverse quantization on input data of the input data type to obtain inverse quantization data, the inverse quantization data is quantized according to the output quantization mode corresponding to the output quantization mode identification, and output data of the output data type is output, wherein the first-class network subgraph is as follows: a network subgraph which needs to be quantized in a third-party quantization mode; judging whether a current processing object is a quantization intermediate layer or not in the process of quantizing the network model to be quantized along the data processing flow direction of the network model to be quantized; if so, quantizing the next network subgraph along the data processing flow direction by using the output quantization mode indicated by the output quantization mode identification in the quantization middle layer parameter; if not, quantizing the current processing object in a quantization mode supported by the model inference platform. The quantized network model comprises the quantization intermediate layer, when model reasoning is carried out, the quantization intermediate layer in the network model can carry out type conversion on interactive data between network subgraphs adopting different quantization modes, data interaction between the network subgraphs and units outside the network model is not needed, and the interactive data is not needed to be converted into data of a fixed data type, so that time consumption brought by type conversion on the interactive data is reduced. Therefore, by applying the network model quantization scheme provided by the embodiment, the efficiency of model reasoning on the quantized model can be improved.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, apparatus embodiments, electronic device embodiments, computer-readable storage medium embodiments, and computer program product embodiments are substantially similar to method embodiments and therefore are described with relative ease, as appropriate, with reference to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

27页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:图像识别方法、装置、电子设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!