Processing method and reasoning method of neural network model, device thereof and electronic equipment

文档序号：35567 发布日期：2021-09-24 浏览：14次中文

阅读说明：本技术 神经网络模型的处理方法、推理方法及其装置和电子设备 (Processing method and reasoning method of neural network model, device thereof and electronic equipment ) 是由肖振鹏万海鹏于 2020-03-23 设计创作，主要内容包括：本申请实施例提供了一种神经网络模型的处理方法、推理方法及其装置和电子设备,能够提高对神经网络模型的处理性能。该神经网络模型的处理方法,包括：获取神经网络模型的图模型,该图模型包括多个节点,该多个节点中每个节点包括一个运算符；根据该多个节点的特性,将该多个节点划分为至少两类；根据划分结果对该多个节点进行合并,以形成至少两类子模型；对该至少两类子模型进行编译,得到至少两类可执行子程序,该至少两类可执行子程序用于运行在至少两类处理器上。采用本申请实施例的方法,通过多种类型的处理器进行神经网络处理,能够充分利用多种处理器的特性,提高对于神经网络整体的处理性能。(The embodiment of the application provides a processing method, an inference method, a device and electronic equipment of a neural network model, and can improve the processing performance of the neural network model. The processing method of the neural network model comprises the following steps: obtaining a graph model of a neural network model, wherein the graph model comprises a plurality of nodes, and each node in the plurality of nodes comprises an operator; dividing the plurality of nodes into at least two types according to the characteristics of the plurality of nodes; merging the nodes according to the division result to form at least two types of sub-models; compiling the at least two types of sub models to obtain at least two types of executable sub programs, wherein the at least two types of executable sub programs are used for running on at least two types of processors. By adopting the method of the embodiment of the application, the neural network processing is carried out through various processors, so that the characteristics of the various processors can be fully utilized, and the overall processing performance of the neural network is improved.)

1. A method for processing a neural network model, comprising:

obtaining a graph model of a neural network model, wherein the graph model comprises a plurality of nodes, and each node in the plurality of nodes comprises an operator;

dividing the plurality of nodes into at least two types according to the characteristics of the plurality of nodes;

merging the nodes according to the division result to form at least two types of sub-models;

compiling the at least two types of sub models to obtain at least two types of executable sub programs, wherein the at least two types of executable sub programs are used for running on at least two types of processors.

2. The processing method according to claim 1, wherein said dividing the plurality of nodes into at least two classes according to the characteristics of the plurality of nodes comprises:

and dividing the plurality of nodes into two types according to the characteristics of the plurality of nodes, wherein one type is nodes suitable for running on the neural network special processor, and the other type is nodes not suitable for running on the neural network special processor.

3. The processing method according to claim 2, wherein said merging the plurality of nodes according to the partitioning result to form at least two types of submodels comprises:

combining nodes of the plurality of nodes that are adapted to run on a neural network dedicated processor to form at least one first submodel;

merging nodes of the plurality of nodes that are not suitable for running on a neural network dedicated processor to form at least one second submodel;

and dividing the at least one first submodel and the at least one second submodel into two types of submodels.

4. The processing method of claim 3, wherein said partitioning the at least one first submodel and the at least one second submodel into two types of submodels comprises:

and taking the at least one first sub-model as a first type sub-model of the two types of sub-models, and taking the at least one second sub-model as a second type sub-model of the two types of sub-models.

5. The processing method of claim 3, wherein said partitioning the at least one first submodel and the at least one second submodel into two types of submodels comprises:

calculating the calculated amount of each first submodel in the at least one first submodel, dividing a first target submodel with the largest calculated amount into a first type submodel in the two types of submodels, and dividing the first submodel except the first target submodel and the at least one second submodel into a second type submodel in the two types of submodels.

6. The processing method of claim 3, wherein said partitioning the at least one first submodel and the at least one second submodel into two types of submodels comprises:

calculating the calculated amount of each first submodel in the at least one first submodel, dividing a first target submodel of which the calculated amount is larger than a preset threshold value into a first type submodel in the two types of submodels, and dividing other first submodels except the first target submodel in the at least one first submodel and the at least one second submodel into a second type submodel in the two types of submodels.

7. The processing method of claim 3, wherein said partitioning the at least one first submodel and the at least one second submodel into two types of submodels comprises:

calculating the calculated amount of each first submodel in the at least one first submodel, dividing a first target submodel with the largest calculated amount and a second target submodel with the calculated amount larger than a preset threshold value into a first submodel of the two types of submodels, and dividing other first submodels except the first target submodel and the second target submodel in the at least one first submodel and the at least one second submodel into a second type submodel of the two types of submodels.

8. The processing method of any of claims 5 to 7, wherein the calculating the amount of calculation for each of the at least one first submodel comprises:

and calculating the calculated amount of each first sub-model according to the number of target nodes in each first sub-model, wherein the target nodes comprise operators of calculation types.

9. The processing method according to any one of claims 4 to 7, wherein said compiling the at least two types of submodels into at least two types of executable subroutines comprises:

compiling the first type sub-model to form a first type executable sub-program, wherein the first type executable sub-program is used for running on a special processor of the neural network;

compiling the second type of sub-model to form a second type of executable sub-program, wherein the second type of executable sub-program is used for running on a non-neural network special processor.

10. The processing method according to any one of claims 1 to 7, wherein said dividing the plurality of nodes into at least two classes according to the characteristics of the plurality of nodes comprises:

setting different labels on the plurality of nodes to distinguish different categories;

the merging the plurality of nodes according to the division result to form at least two types of sub-models, comprising:

and combining the nodes according to the marks on the nodes to form at least two types of sub-models.

11. The processing method according to claim 10, wherein said setting different flags on said plurality of nodes to distinguish different categories comprises:

setting a first flag on a node of the plurality of nodes that is adapted to run on the neural network dedicated processor, and setting a second flag on a node of the plurality of nodes that is not adapted to run on the neural network dedicated processor;

the merging the plurality of nodes according to the marks on the plurality of nodes to form at least two types of sub-models comprises:

combining adjacent nodes with the first mark in the plurality of nodes to form at least one first initial sub-model;

merging neighboring nodes of the plurality of nodes having the second label to form at least one second initial sub-model;

carrying out merging judgment on the at least one first initial submodel and the at least one second initial submodel to form at least one first submodel and at least one second submodel;

and dividing the at least one first submodel and the at least one second submodel into two types of submodels.

12. The processing method according to claim 11, wherein the neighboring nodes with the first label include a first parent node and a first child node, and wherein merging the neighboring nodes with the first label among the plurality of nodes comprises:

if the first father node and the first child node form a ring structure, combining the first father node and the first child node;

if a ring structure is not formed between the first father node and the first child node, merging the first father node and the first child node;

and if the first father node, the first child node and other nodes form a ring structure together, combining the first father node, the first child node and other nodes in sequence according to a preset rule.

13. The processing method of claim 11, wherein no ring structure is formed in each of the at least one first initial submodel;

each of the at least one second initial submodel does not form a ring structure therein.

14. The processing method according to claim 11, wherein said determining the combination of the at least one first initial sub-model and the at least one second initial sub-model to form at least one first sub-model and at least one second sub-model comprises:

if the output node of a first specific initial submodel in the at least one first initial submodel is a specific node, combining the first specific initial submodel with a second specific initial submodel, wherein the second specific initial submodel is a second initial submodel in which the sub node of the output node of the first specific initial submodel is positioned, and taking the combined first specific submodel and the second specific submodel as a second submodel;

if the output node of a second specific initial submodel in the at least one second initial submodel is a specific node, combining the second specific initial submodel with a first specific initial submodel, wherein the first specific initial submodel is a first initial submodel in which the sub node of the output node of the second specific initial submodel is positioned, and taking the combined first specific submodel and the second specific submodel as one second submodel.

15. The processing method of claim 14, wherein the determining the combination of the at least one first initial sub-model and the at least one second initial sub-model to form at least one first sub-model and at least one second sub-model, further comprises:

if the output node of one first initial submodel in at least one first initial submodel is a non-specific node, taking the first initial submodel as a first submodel;

and if the output node of one second initial submodel in the at least one second initial submodel is a non-specific node, taking the second initial submodel as a second submodel.

16. The processing method according to any one of claims 1 to 7, characterized in that it further comprises:

concatenating the at least two classes of executable subroutines to form an executable program of the neural network model;

establishing an interface between the executable program and a user;

acquiring data through the interface;

and executing the executable program to reason the data to obtain an inference result.

17. The processing method of claim 16, wherein a first type of executable sub-routine of the executable program runs on a neural network dedicated processor and a second type of executable sub-routine of the executable program runs on a non-neural network dedicated processor.

18. The process of claim 16, wherein the data is image, voice or text data, and the executing the executable program to reason the data to obtain an inference result comprises:

and executing the executable program to perform target detection on the image, voice or text data to obtain a detection result.

19. A method for reasoning a neural network model, comprising:

acquiring data input by a user;

executing at least two types of executable subprograms in the executable program to carry out reasoning on the data to obtain a reasoning result;

the at least two types of executable subprograms are used for running on at least two types of processors, the at least two types of executable subprograms are obtained by classifying and combining a plurality of nodes according to the characteristics of the plurality of nodes in a graph model of a neural network model, and the graph model of the neural network model is used for processing the data.

20. The inference method of claim 19, wherein a first executable sub-routine of the at least two classes of executable sub-routines runs on a neural network specific processor, and a second executable sub-routine of the at least two classes of executable sub-routines runs on a non-neural network specific processor.

21. The inference method of claim 20, wherein the first class of executable sub-routines are compiled from a first class of sub-models in a graph model of the neural network model, and the second class of executable sub-routines are compiled from a second class of sub-models in the graph model of the neural network model;

the first-type sub-model and the second-type sub-model are obtained by respectively merging two types of nodes in the plurality of nodes, wherein the first type of nodes in the two types of nodes are nodes suitable for running on a special neural network processor, and the second type of nodes in the two types of nodes are nodes not suitable for running on the special neural network processor.

22. The inference method of claim 21, wherein the first-class submodel is obtained by combining the first-class nodes;

the second type submodel is obtained by combining the second type nodes.

23. The inference method according to claim 21, wherein the first-type submodel is a first target submodel with the largest computation amount among at least one first submodel formed after the first-type nodes are merged;

the second type submodel comprises other first submodels except the first target submodel in the at least one first submodel and at least one second submodel formed after the second type nodes are combined.

24. The inference method according to claim 21, wherein the first type submodel includes a first target submodel of which the calculated amount is greater than a preset threshold in at least one first submodel formed after the first type nodes are merged;

25. The inference method according to claim 21, wherein the first type submodel includes a first target submodel with the largest computation amount among at least one first submodel formed after the first type nodes are merged, and a second target submodel with the computation amount larger than a preset threshold;

the second type submodel comprises other first submodels except the first target submodel and the second target submodel in the at least one first submodel and at least one second submodel formed after the second type nodes are combined.

26. The inference method according to any of claims 23-25, characterized in that the computation effort of each of the at least one first submodel is proportional to the number of target nodes in each first submodel, wherein the target nodes comprise operators of the computation type.

27. A reasoning method according to any one of claims 19 to 25, wherein the executable program is obtained by concatenating each of the at least two classes of executable sub-programs according to the topology of the graph model.

28. An inference method according to any one of claims 19-25, wherein said data is image, speech or text data, and said inferring said data to obtain an inference result comprises:

and carrying out target detection on the image, voice or character data to obtain a detection result.

29. An apparatus for processing a neural network model, comprising:

the device comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for obtaining a graph model of a neural network model, the graph model comprises a plurality of nodes, and each node in the plurality of nodes comprises an operator;

the first processing unit is used for dividing the nodes into at least two types according to the characteristics of the nodes;

merging the nodes according to the division result to form at least two types of sub-models;

30. The processing apparatus as in claim 29, wherein the first processing unit is configured to: and dividing the plurality of nodes into two types according to the characteristics of the plurality of nodes, wherein one type is nodes suitable for running on the neural network special processor, and the other type is nodes not suitable for running on the neural network special processor.

31. The processing apparatus as in claim 30, wherein the first processing unit is configured to: combining nodes of the plurality of nodes that are adapted to run on a neural network dedicated processor to form at least one first submodel;

merging nodes of the plurality of nodes that are not suitable for running on a neural network dedicated processor to form at least one second submodel;

and dividing the at least one first submodel and the at least one second submodel into two types of submodels.

32. The processing apparatus as in claim 31, wherein the first processing unit is configured to: and taking the at least one first sub-model as a first type sub-model of the two types of sub-models, and taking the at least one second sub-model as a second type sub-model of the two types of sub-models.

33. The processing apparatus as in claim 31, wherein the first processing unit is configured to: calculating the calculated amount of each first submodel in the at least one first submodel, dividing a first target submodel with the largest calculated amount into a first type submodel in the two types of submodels, and dividing the first submodel except the first target submodel and the at least one second submodel into a second type submodel in the two types of submodels.

34. The processing apparatus as in claim 31, wherein the first processing unit is configured to: calculating the calculated amount of each first submodel in the at least one first submodel, dividing a first target submodel of which the calculated amount is larger than a preset threshold value into a first type submodel in the two types of submodels, and dividing other first submodels except the first target submodel in the at least one first submodel and the at least one second submodel into a second type submodel in the two types of submodels.

35. The processing apparatus as in claim 31, wherein the first processing unit is configured to: calculating the calculated amount of each first submodel in the at least one first submodel, dividing a first target submodel with the largest calculated amount and a second target submodel with the calculated amount larger than a preset threshold value into a first submodel of the two types of submodels, and dividing other first submodels except the first target submodel and the second target submodel in the at least one first submodel and the at least one second submodel into a second type submodel of the two types of submodels.

36. The processing apparatus according to any one of claims 33 to 35, wherein the first processing unit is configured to: and calculating the calculated amount of each first sub-model according to the number of target nodes in each first sub-model, wherein the target nodes comprise operators of calculation types.

37. The processing apparatus according to any one of claims 32 to 35, wherein the first processing unit is configured to: compiling the first type sub-model to form a first type executable sub-program, wherein the first type executable sub-program is used for running on a special processor of the neural network;

38. The processing apparatus according to any one of claims 29 to 35, wherein the first processing unit is configured to: setting different labels on the plurality of nodes to distinguish different categories;

and combining the nodes according to the marks on the nodes to form at least two types of sub-models.

39. The processing apparatus as in claim 38, wherein the first processing unit is configured to: setting a first flag on a node of the plurality of nodes that is adapted to run on the neural network dedicated processor, and setting a second flag on a node of the plurality of nodes that is not adapted to run on the neural network dedicated processor;

combining adjacent nodes with the first mark in the plurality of nodes to form at least one first initial sub-model;

merging neighboring nodes of the plurality of nodes having the second label to form at least one second initial sub-model;

carrying out merging judgment on the at least one first initial submodel and the at least one second initial submodel to form at least one first submodel and at least one second submodel;

and dividing the at least one first submodel and the at least one second submodel into two types of submodels.

40. The processing apparatus as claimed in claim 39, wherein the neighboring nodes with the first flag comprise a first parent node and a first child node, and the first processing unit is configured to:

if the first father node and the first child node form a ring structure, combining the first father node and the first child node;

if a ring structure is not formed between the first father node and the first child node, merging the first father node and the first child node;

41. The processing apparatus of claim 39, wherein no ring structure is formed in each of the at least one first initial submodel;

each of the at least one second initial submodel does not form a ring structure therein.

42. The processing apparatus as recited in claim 39, wherein the first processing unit is configured to: if the output node of a first specific initial submodel in the at least one first initial submodel is a specific node, combining the first specific initial submodel with a second specific initial submodel, wherein the second specific initial submodel is a second initial submodel in which the sub node of the output node of the first specific initial submodel is positioned, and taking the combined first specific submodel and the second specific submodel as a second submodel;

43. The processing apparatus as in claim 42, wherein the first processing unit is further configured to:

if the output node of one first initial submodel in at least one first initial submodel is a non-specific node, taking the first initial submodel as a first submodel;

and if the output node of one second initial submodel in the at least one second initial submodel is a non-specific node, taking the second initial submodel as a second submodel.

44. The processing apparatus according to any one of claims 29 to 35, further comprising: a second acquisition unit and a second processing unit;

the first processing unit is further configured to: concatenating the at least two classes of executable subroutines to form an executable program of the neural network model; and establishing an interface between the executable program and a user;

the second obtaining unit is configured to: acquiring data through the interface;

the first processing unit is further configured to execute one of the at least two types of executable sub-programs, and the second processing unit is configured to execute another of the at least two types of executable sub-programs to perform inference on the data to obtain an inference result.

45. The processing apparatus as in claim 44, wherein the second processing unit is a processing unit on a neural network dedicated processor and the first processing unit is a processing unit on a non-neural network dedicated processor.

46. The processing apparatus as claimed in claim 44, wherein the data is image, voice or text data, and the first processing unit and the second processing unit are configured to perform object detection on the image, voice or text data to obtain a detection result.

47. An inference apparatus of a neural network model, comprising:

the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring data input by a user;

the at least two processing units are used for respectively executing at least two types of executable subprograms in the executable program so as to carry out reasoning on the data and obtain a reasoning result;

the at least two processing units are processing units in at least two types of processors, the at least two types of executable subprograms are obtained by classifying and combining a plurality of nodes according to the characteristics of the plurality of nodes in a graph model of a neural network model, and the graph model of the neural network model is used for processing the data.

48. The inference device of claim 47, wherein a first processing unit of the at least two processing units is a processing unit of a neural network dedicated processor, the first processing unit configured to execute a first executable sub-routine of the at least two classes of executable sub-routines;

a second processing unit of the at least two processing units is a processing unit of a non-neural network dedicated processor, and the second processing unit is configured to execute a second executable sub-program of the at least two executable sub-programs.

49. The inference engine of claim 48, wherein the first class of executable sub-routines is compiled from a first class of sub-models in the graph model, and the second class of executable sub-routines is compiled from a second class of sub-models in the graph model;

the first-type submodel and the second-type submodel are obtained by respectively merging two types of nodes in the plurality of nodes, wherein the first type of nodes in the two types of nodes are nodes suitable for running on a special processor of the neural network, and the second type of nodes in the two types of nodes are nodes not suitable for running on the special processor of the neural network.

50. The inference apparatus according to claim 49, wherein the first type submodel is obtained by combining the first type nodes;

the second type submodel is obtained by combining the second type nodes.

51. The inference device according to claim 49, wherein the first-class submodel is a first submodel with the largest computation amount among at least one first submodel formed by combining the first-class nodes;

the second type submodel comprises other first submodels except the first submodel with the largest calculated amount in the at least one first submodel and at least one second submodel formed after the second type nodes are combined.

52. The inference device according to claim 49, wherein the first type submodel comprises at least one first submodel formed by combining the first type nodes, the calculated amount of which is greater than a preset threshold;

the second type submodel comprises other first submodels except the first submodel with the calculated amount larger than a preset threshold value in the at least one first submodel and at least one second submodel formed after the second type nodes are combined.

53. The inference device according to claim 49, wherein the first type submodel includes a first target submodel with the largest computation amount among at least one first submodel formed after the first type nodes are merged, and a second target submodel with the computation amount larger than a preset threshold;

54. The inference apparatus according to any of claims 51-53, wherein the computation volume of each of the at least one first submodel is proportional to the number of target nodes in each first submodel, wherein the target nodes comprise operators of computation types.

55. Inference apparatus according to any of claims 47-53, wherein the executable program is configured to concatenate each of the at least two classes of executable sub-programs according to the topology of the graph model.

56. An inference apparatus according to any one of claims 47-53, wherein the data is image, speech or text data, and at least two processing units are configured to:

and carrying out target detection on the image, voice or character data to obtain a detection result.

57. An electronic device comprising a processor and a memory, the memory storing program code, the processor being configured to invoke the program code to perform a method of processing a neural network model as claimed in any one of claims 1 to 18.

58. An electronic device comprising a processor and a memory, the memory storing program code, the processor being configured to invoke the program code to perform the method of reasoning for a neural network model according to any one of claims 19 to 28.

59. A computer-readable storage medium for storing program code for executing the processing method of the neural network model according to any one of claims 1 to 18.

60. A computer-readable storage medium for storing program code for performing a method of reasoning for a neural network model according to any one of claims 19 to 28.

Technical Field

The present application relates to the field of computer technologies, and in particular, to a processing method and an inference method for a neural network model, an apparatus thereof, and an electronic device.

Background

In recent years, Deep Learning (Deep Learning) has become one of the hottest research directions in the field of Artificial Intelligence (AI), and has rapidly developed in the application fields of vision, voice, natural language, and the like, and has been enabled in various industries.

At present, most researchers or enterprises engaged in deep learning research and application adopt a Graphics Processing Unit (GPU) to train and infer a neural network model, however, with the continuous development of the neural network model, the GPU cannot meet the actual requirements in terms of performance and power consumption, therefore, the academic and industrial fields have been actively developing the research of Neural network dedicated processors, such as Neural Network Processing Unit (NPU) or Tensor Processor (TPU), in order to improve the processing performance of the processor on the neural network model, however, not all the operations and data in the neural network model are suitable for processing by the neural network special processor, in some cases, running the neural network model through a neural network dedicated processor does not optimize the processing performance of the processor on the neural network.

Therefore, how to improve the processing performance of the neural network model by the processor is a technical problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides a processing method, an inference method, a device and electronic equipment of a neural network model, and can improve the processing performance of a processor on the neural network model.

In a first aspect, a processing method of a neural network model is provided, including: obtaining a graph model of a neural network model, wherein the graph model comprises a plurality of nodes, and each node in the plurality of nodes comprises an operator; dividing the plurality of nodes into at least two types according to the characteristics of the plurality of nodes; merging the nodes according to the division result to form at least two types of sub-models; compiling the at least two types of sub models to obtain at least two types of executable sub programs, wherein the at least two types of executable sub programs are used for running on at least two types of processors.

In the scheme of the application, the graph model is split into at least two types of sub models according to the characteristics of nodes in the graph model of the neural network model, and after the at least two types of sub models are compiled respectively, the at least two types of sub models can be operated in at least two types of processors respectively. Compared with the method for processing the neural network by a single type of processor, the method for processing the neural network by the multiple types of processors can make full use of the characteristics of the multiple types of processors and comprehensively improve the overall processing performance of the multiple types of processors on the neural network.

In a possible embodiment, the dividing the plurality of nodes into at least two classes according to the characteristics of the plurality of nodes includes: according to the characteristics of the nodes, the nodes are divided into two types, wherein one type is the node suitable for running on the special neural network processor, and the other type is the node not suitable for running on the special neural network processor.

According to the scheme of the embodiment, the nodes of the graph model of the neural network model are divided into the nodes which are suitable for operating on the neural network special processor and the nodes which are not suitable for operating on the neural network special processor, the characteristics of the neural network special processor are fully utilized, only part of operations in the neural network model are operated on the neural network special processor, the operations which are not suitable for operating on the neural network special processor are arranged on other processors, the processing capacity of the neural network special processor on the neural network model is improved, and the processing capacity of various processors on the neural network model is further comprehensively improved.

In a possible embodiment, the merging the plurality of nodes according to the division result to form at least two types of submodels includes: combining nodes of the plurality of nodes that are adapted to run on a neural network dedicated processor to form at least one first submodel; combining nodes of the plurality of nodes that are not suitable for running on the neural network dedicated processor to form at least one second submodel; and dividing the at least one first submodel and the at least one second submodel into two types of submodels.

In a possible embodiment, the partitioning the at least one first submodel and the at least one second submodel into two types of submodels comprises: and taking the at least one first sub-model as a first type sub-model in the two types of sub-models, and taking the at least one second sub-model as a second type sub-model in the two types of sub-models.

In a possible embodiment, the partitioning the at least one first submodel and the at least one second submodel into two types of submodels comprises: calculating the calculated amount of each first submodel in the at least one first submodel, dividing a first target submodel with the largest calculated amount into a first type submodel in the two types of submodels, and dividing other first submodels except the first target submodel in the at least one first submodel and the at least one second submodel into a second type submodel in the two types of submodels.

In a possible embodiment, the partitioning the at least one first submodel and the at least one second submodel into two types of submodels comprises: calculating the calculated amount of each first submodel in the at least one first submodel, dividing a first target submodel of which the calculated amount is larger than a preset threshold value into a first type submodel of the two types of submodels, and dividing other first submodels except the first target submodel in the at least one first submodel and the at least one second submodel into a second type submodel of the two types of submodels.

In a possible embodiment, the partitioning the at least one first submodel and the at least one second submodel into two types of submodels comprises: calculating the calculated amount of each first submodel in the at least one first submodel, dividing a first target submodel with the largest calculated amount and a second target submodel with the calculated amount larger than a preset threshold value into first submodels in the two types of submodels, and dividing other first submodels except the first target submodel and the second target submodel in the at least one first submodel and the at least one second submodel into second submodels in the two types of submodels.

By the embodiment, part of the first submodel is divided into the second type submodel and runs on the non-neural network special processor. Only the first submodel with the largest operand or the first submodel with the operand larger than the preset threshold value is operated on the special neural network processor, so that the time of data transmission between the processors can be reduced, the operation resources of the special neural network processor and the special non-neural network processor are utilized to the maximum extent, and the processing capacity of the processors on the neural network is improved comprehensively.

In a possible embodiment, the calculating the calculation amount of each of the at least one first submodel includes: and calculating the calculation amount of each first sub-model according to the number of target nodes in each first sub-model, wherein the target nodes comprise operators of calculation types.

In a possible embodiment, the compiling the at least two types of sub-models to obtain at least two types of executable sub-programs includes: compiling the first type sub-model to form a first type executable sub-program, wherein the first type executable sub-program is used for running on a special processor of the neural network; compiling the second type of sub-model to form a second type of executable sub-program, wherein the second type of executable sub-program is used for running on the non-neural network special processor.

In a possible embodiment, the dividing the plurality of nodes into at least two classes according to the characteristics of the plurality of nodes includes: setting different labels on the plurality of nodes to distinguish different categories; the merging the plurality of nodes according to the division result to form at least two types of submodels includes: and combining the nodes according to the marks on the nodes to form at least two types of sub-models.

In one possible embodiment, the setting different flags on the plurality of nodes to distinguish different categories includes: setting a first flag on a node of the plurality of nodes that is adapted to run on the neural network dedicated processor, and setting a second flag on a node of the plurality of nodes that is not adapted to run on the neural network dedicated processor; the merging the nodes according to the marks on the nodes to form at least two types of sub-models comprises: combining adjacent nodes with the first mark in the plurality of nodes to form at least one first initial sub-model; merging adjacent nodes with the second mark in the plurality of nodes to form at least one second initial sub-model; carrying out merging judgment on the at least one first initial submodel and the at least one second initial submodel to form at least one first submodel and at least one second submodel; and dividing the at least one first submodel and the at least one second submodel into two types of submodels.

In one possible embodiment, the merging the neighboring nodes with the first label among the plurality of nodes, including: if the first father node and the first child node form a ring structure, merging the first father node and the first child node; if the first father node and the first child node do not form a ring structure, merging the first father node and the first child node; if the first father node, the first child node and other nodes form a ring structure together, the first father node, the first child node and other nodes are combined in sequence according to a preset rule.

In a possible embodiment, no ring structure is formed in each of the at least one first initial submodel; each of the at least one second initial submodel does not form a ring structure therein.

In a possible embodiment, the determining the combination of the at least one first initial submodel and the at least one second initial submodel to form at least one first submodel and at least one second submodel includes: if the output node of a first specific initial submodel in the at least one first initial submodel is a specific node, combining the first specific initial submodel with a second specific initial submodel, wherein the second specific initial submodel is a second initial submodel in which the sub node of the output node of the first specific initial submodel is positioned, and taking the combined first specific submodel and the second specific submodel as a second submodel; if the output node of a second specific initial submodel in the at least one second initial submodel is a specific node, combining the second specific initial submodel with a first specific initial submodel, wherein the first specific initial submodel is a first initial submodel in which the sub node of the output node of the second specific initial submodel is positioned, and taking the combined first specific submodel and the second specific submodel as a second submodel.

In one possible implementation, the processing method further includes: connecting the at least two types of executable subprograms in series to form an executable program of the neural network model; establishing an interface between the executable program and a user; acquiring data through the interface; and executing the executable program to reason the data to obtain an inference result.

Through the scheme of the embodiment, a user can directly input commands and data into the executable program of the neural network model based on a specific user language, at least two types of executable subprograms in the executable program run in different processors respectively, the processing capacity of the neural network model is improved, meanwhile, the data inference result of the neural network model can be rapidly obtained, the operation of the user is facilitated, and the user experience is improved.

In one possible implementation, a first type of executable sub-program of the executable program runs on the neural network dedicated processor, and a second type of executable sub-program of the executable program runs on the non-neural network dedicated processor.

In one possible embodiment, the data is image, voice or text data, and the executing the executable program to reason the data to obtain an inference result includes: and executing the executable program to perform target detection on the image, voice or text data to obtain a detection result.

In a second aspect, a neural network model inference method is provided, including: acquiring data input by a user; executing at least two types of executable subprograms in the executable program to carry out reasoning on the data to obtain a reasoning result; the at least two types of executable subprograms are used for running on at least two types of processors, the at least two types of executable subprograms are obtained by classifying and combining a plurality of nodes according to the characteristics of the plurality of nodes in a graph model of a neural network model, and the graph model of the neural network model is used for processing the data.

According to the scheme, a user can directly input data into the executable program of the neural network model based on a specific user language, at least two types of executable subprograms in the executable program run in different processors respectively, the processing capacity of the neural network model is improved, meanwhile, the data inference result of the neural network model can be rapidly obtained, the operation of the user is facilitated, and the user experience is improved.

In one possible implementation, a first executable sub-program of the at least two executable sub-programs runs on the neural network dedicated processor, and a second executable sub-program of the at least two executable sub-programs runs on the non-neural network dedicated processor.

By adopting the scheme of the embodiment, the first type of executable subprogram in the executable program is executed by adopting the special processor for the neural network, so that the processing capacity of the neural network model can be further improved.

In a possible implementation manner, the first executable sub-program is compiled according to a first sub-model in a graph model of the neural network model, and the second executable sub-program is compiled according to a second sub-model in the graph model of the neural network model; the first-type sub-model and the second-type sub-model are obtained by respectively merging two types of nodes in the plurality of nodes, wherein the first type of nodes in the two types of nodes are nodes suitable for running on a special neural network processor, and the second type of nodes in the two types of nodes are nodes not suitable for running on the special neural network processor.

In a possible implementation manner, the first-type sub-model is obtained by combining the first-type nodes; the second-type submodel is obtained by combining the second-type nodes.

In a possible implementation manner, the first-type submodel is a first target submodel with the largest calculated amount in at least one first submodel formed after the first-type nodes are combined; the second type submodel comprises other first submodels except the first target submodel in the at least one first submodel and at least one second submodel formed after the second type nodes are combined.

In a possible implementation manner, the first-type submodel includes a first target submodel, of which the calculated amount is greater than a preset threshold, in at least one first submodel formed after the first-type nodes are combined; the second type submodel comprises other first submodels except the first target submodel in the at least one first submodel and at least one second submodel formed after the second type nodes are combined.

In a possible implementation manner, the first-type submodel includes a first target submodel with the largest calculated amount in at least one first submodel formed after the first-type nodes are combined, and a second target submodel with the calculated amount larger than a preset threshold; the second type submodel comprises other first submodels except the first target submodel and the second target submodel in the at least one first submodel and at least one second submodel formed after the second type of nodes are combined.

In one possible embodiment, the computation amount of each of the at least one first submodel is proportional to the number of target nodes in each of the at least one first submodel, wherein the target nodes include operators of the computation types.

In one possible embodiment, the executable program is obtained by concatenating each executable sub-program of the at least two types of executable sub-programs according to the topology of the graph model.

In a possible implementation, the data is image, voice or text data, and the reasoning on the data to obtain the reasoning result includes: and carrying out target detection on the image, voice or character data to obtain a detection result.

In a third aspect, an apparatus for processing a neural network model is provided, including: the device comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for obtaining a graph model of a neural network model, the graph model comprises a plurality of nodes, and each node in the plurality of nodes comprises an operator; the first processing unit is used for dividing the nodes into at least two types according to the characteristics of the nodes; merging the nodes according to the division result to form at least two types of sub-models; compiling the at least two types of sub models to obtain at least two types of executable sub programs, wherein the at least two types of executable sub programs are used for running on at least two types of processors.

In a fourth aspect, an inference apparatus of a neural network model is provided, including: the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring data input by a user; the at least two processing units are used for respectively executing at least two types of executable subprograms in the executable program so as to carry out reasoning on the data and obtain a reasoning result; the at least two processing units are processing units in at least two types of processors, the at least two types of executable subprograms are obtained by classifying and combining a plurality of nodes according to the characteristics of the plurality of nodes in a graph model of a neural network model, and the graph model of the neural network model is used for processing the data.

In a fifth aspect, an electronic device is provided, which includes a processor and a memory, the memory is used for storing program codes, and the processor is used for calling the program codes and executing the processing method of the neural network model in the first aspect or any one of the possible implementation manners of the first aspect.

In a sixth aspect, an electronic device is provided, which includes a processor and a memory, the memory is used for storing program codes, and the processor is used for calling the program codes and executing the inference method of the neural network model in the second aspect or any possible implementation manner of the second aspect.

In a seventh aspect, a computer-readable storage medium is provided for storing program code for performing the processing method of the neural network model in the first aspect or any one of the possible implementations of the first aspect.

In an eighth aspect, a computer-readable storage medium is provided for storing program code for performing the inference method of the neural network model in the second aspect or any one of the possible embodiments of the second aspect.

Drawings

FIG. 1 is a schematic diagram of an artificial intelligence technology architecture based on deep learning according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a processing system for a neural network model in accordance with an embodiment of the present application;

FIG. 3 is a schematic diagram of a processing system for another neural network model in accordance with an embodiment of the present application;

FIG. 4 is a schematic flow chart diagram of a method of processing a neural network model in accordance with an embodiment of the present application;

FIG. 5 is a schematic diagram of a graph model of a neural network model according to an embodiment of the present application;

FIG. 6 is a schematic flow chart diagram of another neural network model processing method in accordance with an embodiment of the present application;

fig. 7 and 8 are schematic structural diagrams of a graph model of a neural network model after node division and node combination according to an embodiment of the present application;

FIGS. 9-11 are diagrams of several parent and child node combinations according to embodiments of the present application;

FIGS. 12 and 13 are two schematic diagrams of initial submodels merged to form a submodel according to an embodiment of the application;

FIG. 14 is a schematic flow chart diagram of another neural network model processing method in accordance with an embodiment of the present application;

FIG. 15 is a schematic flow chart diagram of a method of reasoning in a neural network model in accordance with an embodiment of the present application;

FIG. 16 is a schematic flow chart diagram of another neural network model inference method in accordance with an embodiment of the present application;

FIG. 17 is a schematic block diagram of a processing device of a neural network model in accordance with an embodiment of the present application;

fig. 18 is a schematic block diagram of an inference apparatus of a neural network model according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

For the convenience of understanding, the following description will first describe related concepts such as deep learning, neural network model and related terms related to the embodiments of the present application.

Deep learning, also known as Deep Neural Network (DNN), is essentially a multi-level Artificial Neural Network (ANN) algorithm, i.e., the operation mechanism of the human brain is modeled from a structure, and the operation mechanism of the human brain is modeled from the most basic units. Deep learning is divided into two links of training (training) and inference (inference). Mass data input is needed for training, and a complex deep neural network model is trained. Inference refers to the use of trained models to "infer" the data to be judged to reach various conclusions.

Fig. 1 shows a schematic diagram of an artificial intelligence technology architecture based on deep learning.

As shown in fig. 1, the artificial intelligence algorithm based on deep learning is mainly implemented by relying on a computer technology architecture, which includes a Hardware layer (Hardware), a neural network model Compiler (Compiler), a Software Framework (Software Framework), and an uppermost layer of basic applications (Application).

In particular, the hardware layer provides the underlying computing power for the algorithm. The hardware layer covers a computing chip customized for a Specific scene Application, such as an Application Specific Integrated Circuit (ASIC) chip, a Field Programmable Gate Array (FPGA) chip, and the like, in addition to a Central Processing Unit (CPU) and a GPU. Optionally, the hardware layer may further include a neural network processor NPU, a tensor processor TPU, a Deep learning processor (DPU), and the like, in addition to the above chips, which may be used to process a computing chip of the neural network. Optionally, in addition to the computing chip, the hardware layer further includes a server customized based on the computing chip, a GPU server cluster, or various mobile terminal devices, computers, and the like.

The neural network model compiler above the hardware layer is a bridge between the underlying hardware and software frameworks and between different software frameworks. The neural network model compiler aims to provide a hardware calling interface for upper-layer application and solve the problems of incompatibility and the like which may exist when different upper-layer applications use different bottom-layer hardware computing chips. The coverage range of the method comprises a deep neural network model compiler for artificial intelligence calculation chip orientation optimization, and specification and formats for different neural network model representations. As shown in fig. 1, the neural network model compiler is a compiler created based on a Low Level Virtual Machine (LLVM), and fig. 1 shows various existing neural network compilers, such as an nGraph compiler, an NNVM/TVM compiler, an ONNX compiler, an NNEF compiler, and so on.

At the upper layer of a neural network model compiler, a deep learning algorithm (neural network model) is packaged into a software framework, a massive data set and a training strategy are also input into the software framework, parameters in the neural network model are trained, the trained neural network model is obtained, and the trained neural network model can be used for predicting unknown data attributes. As shown in fig. 1, the currently mainstream software frameworks include, but are not limited to, the software frameworks for deep learning such as tensrflow, MXNet, Caffe or pitorch.

At present, the current commercial implementation of artificial intelligence is mainly realized based on basic application technologies such as computer vision, intelligent voice, natural language processing and the like, corresponding products or services are formed, different application requirements are realized through a software framework, and data operation and processing are further performed through a lower neural network model compiler and a hardware layer.

In general, in the deep learning architecture of fig. 1, for a specific basic application, the processor chip in the hardware layer for processing the neural network is one of a CPU, a GPU, an ASIC, or an NPU, but a plurality of hardware chips are not simultaneously used for processing the same neural network model. Thus, the neural network model compiler is also typically a single compiler type, corresponding to the chip type in the hardware layer, adapted to the chip in the hardware layer. In addition, in addition to the hardware chip and the compiler, the neural network model, or the deep learning algorithm, may be alternatively packaged, trained, and inferred by any software framework in fig. 1.

In the hardware layer, a Neural Network dedicated processor such as an NPU or a TPU is usually used to process the Neural Network model to improve the processing efficiency and the processing performance of the Neural Network model, however, the Neural Network model has a variety of different Neural Network model architectures, such as Deep Neural Network (DNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), or graph Neural Network. Different neural network models have different model architectures and operation methods, and therefore, not all neural network models can be applied to the neural network dedicated processor.

In any neural network model, various types of arithmetic processing, simple or complex, are included, and not all arithmetic and data are suitable for processing by a neural network dedicated processor. If only a special processor of the neural network is used or only other single type processors are used for the operation and processing of the neural network, the processing performance of the neural network cannot be optimized.

Based on the above problems, the present application provides a method and an apparatus for processing a neural network model, and an electronic device, in which the neural network model in a software framework is split to form a plurality of submodels, and after the plurality of submodels are compiled respectively, the plurality of submodels can be run in a CPU, a processor chip dedicated to a neural network, or other processor chips, respectively, so as to improve the overall processing performance of the neural network.

For better understanding of the solution of the embodiment of the present application, a brief description is given below to possible application scenarios of the embodiment of the present application with reference to fig. 2 to 3.

Fig. 2 shows a processing system of a neural network model, which comprises a user device and a data processing device. The user equipment comprises a mobile phone, a personal computer or an intelligent terminal such as an information processing center. The user equipment is an initiator of data processing, and as an initiator of requests of application requirements, such as voice recognition, image recognition, and the like, a user usually initiates the requests through the user equipment.

The data processing device may be a device or a server having a data processing function, such as a cloud server, a network server, an application server, and a management server. The data processing equipment receives data such as voice, text or images from the intelligent terminal through the interactive interface, and then performs data processing in modes such as machine learning and deep learning through a memory for storing data and a processor link for data processing. The memory in the data processing device may be a generic term that includes a database that stores locally and stores historical data, either on the data processing device or on other network servers.

In the processing system of the neural network model shown in fig. 2, the user equipment may receive an instruction of a user, for example, a camera in the user equipment may capture a segment of video file, and then initiate a request to the data processing equipment, so that the data processing equipment performs image recognition or target detection and the like on an image frame in the video file through a neural network model algorithm with respect to the segment of video file obtained by the user equipment, thereby obtaining a target detection result (for example, face recognition and the like).

In fig. 2, a data processing apparatus may perform the processing method of the neural network model of the present application.

Fig. 3 shows another processing system of a neural network model, in fig. 3, a user device directly serves as a data processing device, and the user device can directly receive an input from a user and directly perform processing by hardware of the user device itself, and a specific process is similar to that in fig. 2, and reference may be made to the above description, and details are not repeated here.

In fig. 3, the user equipment itself can execute the neural network model processing method of the present application.

The processors in fig. 2 and 3 may perform data training/machine learning/deep learning through a neural network model or other models (e.g., models based on a support vector machine), and perform target detection (e.g., image recognition, voice recognition, etc.) on data such as voice, image, text, etc. by using the model finally trained or learned by the data, so as to obtain corresponding processing results.

Alternatively, the processor in fig. 2 and 3 may be one or more of the CPU, GPU, ASIC or FPGA in fig. 1. In some embodiments, the processor in fig. 2 and 3 may include a CPU and a TPU.

In addition, it should be noted here that, besides the CPU and the TPU, the processor in fig. 2 and fig. 3 may also include other neural network dedicated processors, such as NPU, DPU, and the like, and the embodiment of the present application does not specifically limit the type of the specific neural network dedicated processor. To distinguish the neural network dedicated processor from other types of processors, other types of processors are written herein as non-neural network dedicated processors, including but not limited to general purpose processors or other types of dedicated processors, such as CPUs, GPUs, etc., and the type of non-neural network dedicated processor is not specifically limited by the embodiments herein.

Next, a processing method of the neural network model of the present application is described in detail with reference to fig. 4 to 14.

Fig. 4 shows a schematic flow diagram of a processing method 100 of a neural network model. The method may be performed by a processor, such as the processor described above in fig. 2 and 3, or a processing device comprising a processor, such as the data processing device described above in fig. 2 and 3. The processor can be a Central Processing Unit (CPU), a Microprocessor/microcontroller (MPU/MCU), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other general purpose processor modules.

As shown in fig. 4, the processing method 100 of the neural network model or the like may include the following steps.

S110: a graph model of a neural network model is obtained, the graph model including a plurality of nodes, each of the plurality of nodes including an operator.

In some embodiments, the neural network model in this step may be an optimized neural network model after training in a software framework, where network model parameters are optimized through a large amount of data and a training algorithm, and the optimized neural network model after training may be directly used for data inference. In other words, the data can be directly input into the optimized neural network model, and the inference result is obtained. For example, if the neural network model is used for target detection, the inference result is the target detection result.

In other embodiments, the neural network model in this step may also be an initial neural network model, i.e., an untrained neural network model, in which the network model parameters are custom initial values or random values.

Specifically, in a software framework, for example, in a tensrflow software framework, a neural network model is encapsulated as a graph model, wherein a graph is a data structure, which is composed of a set of objects (nodes) and their relationships (edges), and the nodes (nodes) may also be referred to as neurons, operators, or Operators (OPs) to represent the operation mode of data; edges may also be referred to as tensors (tensors), representing the flow of data between operators.

Fig. 5 shows a schematic structural diagram of a graph model of a neural network model. As shown in fig. 5, each node represents an operation, and each node has a number of input tensors and output tensors corresponding to an operator. Alternatively, the input tensor and the output tensor of the node may be defined as constant (constant), variable (variable) or placeholder (placeholder) data formats. In some embodiments, the input and output tensors of the nodes are defined as a placeholder data format, when the neural network model is inferred or trained, original data is input into the placeholder, and the data trained or inferred by the neural network model is output through operation and transmission of the data between the nodes.

S120: the plurality of nodes are divided into at least two classes according to the characteristics of the plurality of nodes.

Optionally, in some embodiments, the plurality of nodes in the graph model of the neural network model may be divided into two types according to the type of the nodes in the graph model, wherein one type is an operator suitable for running on a neural network dedicated processor, such as a TPU or an NPU, and the other type is an operator unsuitable for running on the TPU or the NPU.

Optionally, in other embodiments, the nodes in the graph model may be further divided into two classes based on a Single Instruction Multiple Data (SIMD) technique, where nodes performing the same operation at the same time are nodes suitable for operating on the TPU or the NPU, and other nodes that do not meet the characteristic are nodes unsuitable for operating on the TPU or the NPU.

It should be understood that, in addition to the plurality of nodes being divided into two classes, the plurality of nodes may be divided into multiple classes, each of the multiple classes of nodes running on a different processor. For example, after the division, one type of node is a node suitable for running on the TPU, one type of node is a node suitable for running on the GPU, and the other type of node is a node suitable for running on the CPU. The embodiment of the present application does not specifically limit the specific classification types.

It should also be understood that, besides the above manner, other determination criteria in the prior art may also be used to divide the nodes in the graph model of the neural network model, and the embodiment of the present application does not specifically limit the specific division rules and methods.

In addition, in this embodiment of the present application, when the plurality of nodes are divided into at least two types, different marks may be set on the plurality of nodes to distinguish different categories, where the marks may be letters, numbers, symbols, or a combination thereof, or other different marks in any form, and this is not particularly limited in this embodiment of the present application.

S130: and combining the nodes according to the division result to form at least two types of sub-models.

Optionally, after the nodes are divided, adjacent nodes in the same type of nodes are merged according to the division result to form a merged new node, and the merging of the nodes can also be understood as merging of operators. In order to distinguish the new node after combination from the node before combination, the new node after combination is also called a submodel, and one submodel comprises at least one node and also has tensors of input and output.

In some embodiments, after the plurality of nodes are divided into two types suitable for running on the neural network dedicated processor and two types unsuitable for running on the neural network dedicated processor, the nodes suitable for running on the neural network dedicated processor are combined to form one type of submodel, and the nodes unsuitable for running on the neural network dedicated processor are combined to form another type of submodel.

Of course, in addition to the above embodiments, after a plurality of nodes are divided into multiple classes of nodes suitable for running on multiple classes of processors, each class of nodes may be merged to form a multiple classes submodel.

Optionally, if different labels are set on nodes of different categories, merging the nodes according to the labels on the nodes to form at least two types of sub-models.

In the embodiment of the application, when at least two types of submodels are combined and formed, the topological relation among the submodels and the related information of the tensor of mutual interaction among the submodels are also formed.

S140: compiling the at least two types of sub models to obtain at least two types of executable sub programs, wherein the at least two types of executable sub programs are respectively operated on the at least two types of processors.

Specifically, in this step, at least two compilers are used to compile at least two types of submodels to obtain at least two types of executable programs, where one compiler corresponds to one type of processor and is used to compile one type of submodels to obtain one type of executable program that can run on the processor. For example, after the CPU compiler compiles the sub-model, the obtained executable program may run on the CPU, and after the TPU compiler compiles the sub-model, the obtained executable program may run on the TPU.

It should be understood that in this step, the compilation process of the compiler on the sub-model is the same as that in the prior art, and those skilled in the art can implement this step by referring to the compilation process in the prior art, which is not described herein again.

In the solution of the embodiment of the present application, the graph model is split into at least two types of sub models according to characteristics of nodes in the graph model of the neural network model, and after the at least two types of sub models are compiled respectively, the at least two types of sub models can be run in a CPU, a dedicated processor for the neural network, or other processors, respectively. Compared with the method for processing the neural network by a single type of processor, the method provided by the embodiment of the application can be used for processing the neural network by multiple types of processors, so that the characteristics and resources of the multiple types of processors can be fully utilized, and the overall processing performance of the multiple types of processors on the neural network can be comprehensively improved.

Hereinafter, an example of processing a graph model to form two types of sub models is described, and a related method for forming multiple types of sub models may refer to the related description below, which is not described herein again.

Fig. 6 shows a schematic flow diagram of another neural network model processing method 100.

As shown in fig. 6, the processing method 100 of the neural network model may include:

s110: and acquiring a graph model of the neural network model.

S121: the plurality of nodes are divided into two classes according to whether the plurality of nodes are suitable to run on the neural network special processor or not.

This step may be an embodiment of step S120 described above.

Specifically, the nodes in the plurality of nodes that are suitable for running on the neural network dedicated processor are classified into a first type of nodes, and the nodes in the plurality of nodes that are not suitable for running on the neural network dedicated processor are classified into a second type of nodes.

Alternatively, in the process of partitioning the plurality of nodes, each node in the plurality of nodes may be labeled to form two types of labels (tags), for example, a first label, such as a support, for a node suitable for running on the neural network dedicated processor, and a second label, such as an unsupported, for a node not suitable for running on the neural network dedicated processor. Of course, the contents of the first mark and the second mark may also be any other characters, numbers or letters for distinguishing the two types of nodes, which is not specifically limited in this embodiment of the application.

S131: combining nodes in the plurality of nodes which are suitable for running on a neural network special processor to form at least one first initial submodel; and combining the nodes which are not suitable for running on the special neural network processor in the plurality of nodes to form at least one initial second submodel.

The steps S131 to S133 may be an embodiment of the step S130.

Optionally, combining adjacent nodes with the first label in the plurality of nodes to form at least one first initial sub-model; and merging the adjacent nodes with the second marks in the plurality of nodes to form at least one second initial sub-model.

Optionally, after the initial submodels are combined to form the initial submodels, the topological relation between the at least one first initial submodel and the at least one second initial submodel and the related information of the tensors of interaction between the initial submodels need to be recorded.

For example, fig. 7 and 8 show schematic structural diagrams of a graph model of a neural network model after node division and node combination.

As shown in fig. 7, the graph model includes seven nodes a to G, and after the node division determination, all six other nodes except the node D are suitable for running on a dedicated processor of the neural network, and the node D is labeled as an unsupported node, and the other six nodes are labeled as supported nodes. Combining the neighboring nodes with the same label forms 3 initial submodels as shown in fig. 8, in which A, B, C node forms initial submodel 1, D node forms initial submodel 2, and E, F, G node forms initial submodel 3. Submodel 1 and submodel 3 are both first initial submodels, while submodel 2 is a second initial submodel.

Optionally, after the graph model is divided into three initial submodels, the topological relation between the three initial submodels and the related information of the interacted tensors between the three initial submodels are recorded, for example, the output tensors of the node B and the node C are both used as the input of the node D, and the output of the node D is simultaneously used as the input of the node E and the node F.

In addition, in the process of merging nodes with the same label, it is necessary to pay attention that a "ring structure" cannot be formed in the first initial sub-model and the second initial sub-model after merging. Therefore, the process of node merging in the embodiment of the present application will be described below with reference to fig. 9 to 11.

Specifically, in the graph model, two adjacent nodes may be referred to as a parent node and a child node, wherein an output of the parent node serves as an input of the child node. Parent and child nodes with adjacent labels may be merged, e.g., a first parent and a first child with a first label may be merged, as may a second parent and a second child with a second label.

Fig. 9 to 11 show several combinations of parent nodes and child nodes, and in fig. 9 to 11, the output of the a node is the input of the B node, and in this case, the a node may be referred to as the parent node of the B node, and the B node is the child node of the a node, and the a node and the B node have the same reference number.

As shown in fig. 9, a "ring structure" is directly formed between the node a and the node B, and before combining other nodes, the nodes forming the ring structure are combined, that is, the node a and the node B are combined to form an initial sub-model.

As shown in several cases in fig. 10, data output by the node a may directly reach the node B without passing through other nodes, and no "ring structure" is generated between the node a and the node B, so that the node a and the node B may be directly merged to form an initial sub-model.

As shown in several cases in fig. 11, in addition to the data output by the a node directly reaching the B node, the data output by the a node may also pass through other nodes, and the a node, the B node and other nodes form a "ring structure", for example, as shown in the upper diagram in fig. 11, the a node is connected to the B node, and the a node is also connected to the B node through the C node. Or the node B and the node C are merged firstly, and then the merged node BC and the node A are merged to form an initial sub-model.

S132: and carrying out merging judgment on the at least one first initial submodel and the at least one second initial submodel to form at least one first submodel and at least one second submodel.

Specifically, in some embodiments, if an output node of a first specific initial sub-model in the at least one first initial sub-model is a specific node, the first specific initial sub-model is merged with a second specific initial sub-model, where the second specific initial sub-model is an initial sub-model in which a sub-node of the output node of the first specific initial sub-model is located;

and if the second specific initial submodel is the second initial submodel, taking the combined first specific submodel and the second specific submodel as a second submodel.

Similarly, in other embodiments, if an output node of a second specific initial sub-model in the at least one second initial sub-model is a specific node, the second specific initial sub-model is merged with the first specific initial sub-model, where the first specific initial sub-model is an initial sub-model in which a sub-node of the output node of the second specific initial sub-model is located;

and if the first specific initial submodel is the first initial submodel, taking the combined second specific submodel and the first specific submodel as a second submodel.

Fig. 12 and 13 show schematic diagrams of the merging of initial submodels into submodels in both cases.

As shown in fig. 12 and 13, the a node and the B node are merged to form an initial sub-model 1, and the C node is an initial sub-model 2.

If the output node in the initial sub-model 1, i.e. the node B, belongs to a specific node, i.e. to a node that is not suitable for being output as a model, for example, the node includes specific type-specific operators and logical operators, etc. At this time, the initial sub-model where the child node of the node B is located needs to be merged with the initial sub-model 1 to form a new sub-model, so as to prevent the node B from being an output node of the sub-model. That is, in the embodiment of the present application, a child node of a node B is a node C, a child model where the node C is located is an initial child model 2, and the initial child model 2 and the initial child model 1 are combined to form a new child model.

As shown in fig. 12, the initial sub-model 1 is a first initial sub-model, i.e. both the node a and the node B are suitable for operating on the neural network dedicated processor, the initial sub-model 2 is a second initial sub-model, and the node C is not suitable for operating on the neural network dedicated processor. And combining the initial submodel 1 and the initial submodel 2 to form a second submodel which runs on a non-neural network special processor.

As shown in fig. 13, the initial sub-model 1 is the second initial sub-model, i.e. both the node a and the node B are nodes that are not suitable for running on the neural network dedicated processor, the initial sub-model 2 is the first initial sub-model, and the node C is a node that is suitable for running on the neural network dedicated processor. And after the initial submodel 1 and the initial submodel 2 are combined, a second submodel is also formed and runs on the non-neural network special processor.

Fig. 12 and 13 above illustrate the case where a specific node is included in the initial submodel, it should be understood that if the output node of the first initial submodel is a non-specific node, i.e. the output node of the first initial submodel is a node suitable as an output, the first initial submodel is taken as a first submodel; similarly, if the output node of the second initial submodel is an unspecified node, the second initial submodel is used as a second submodel.

Optionally, in this step, it is necessary to record the topological relation between the at least one first submodel and the at least one second submodel and the related information of the tensor interacted between the submodels according to the update.

S133: and dividing the at least one first submodel and the at least one second submodel into two types of submodels.

As one possible embodiment, S1331: and taking the at least one first submodel as a first type submodel, and taking the at least one second submodel as a second type submodel.

As another possible embodiment, S1332: the calculation amount of each first submodel in the at least one first submodel is calculated, the first target submodel with the largest calculation amount is divided into a first type submodel in the two types of submodels, the first submodels except the first target submodel in the at least one first submodel and the at least one second submodel are divided into a second type submodel in the two types of submodels.

As a third possible embodiment, S1333: calculating the calculated amount of each first submodel in at least one first submodel, dividing a first target submodel of which the calculated amount is larger than a preset threshold value into a first submodel of two types, and dividing other first submodels except the first target submodel and at least one second submodel into a second submodel of the two types.

As a fourth possible embodiment, S1334: calculating the calculated amount of each first submodel in at least one first submodel, dividing a first target submodel with the largest calculated amount and a second target submodel with the calculated amount larger than a preset threshold value except the first target submodel into a first submodel of two types, and dividing other first submodels except the first target submodel and the second target submodel in at least one first submodel and at least one second submodel into a second submodel of two types.

In the above process of calculating the calculation amount of each first sub-model, the number of nodes including calculation type operators in the first sub-model may be counted, for example, convolution, addition, subtraction, and other operators all belong to calculation type operators, a node including an operator of the operation type may be referred to as a target node, and the larger the number of target nodes in the first sub-model, the larger the calculation amount of the first sub-model.

The preset threshold values in the above steps S1333 and S1334 may represent a preset number of target nodes in the first submodel.

Optionally, in step S121, when the plurality of nodes in the graph model are divided and marked, it may further mark whether the node is a target node on a first type node suitable for running on the neural network special-purpose processor, for example, marking computing on the target node indicates that the target node includes an operator of a computation type, and the target node is further marked with a first mark, such as a support.

In this step, the calculation amount of each of the at least one first submodel may be rapidly calculated directly from the flag.

In the above steps S1332, S1333 and S1334, part of the first sub-model is divided into a second type sub-model, and is operated on the non-neural network dedicated processor. Only the first submodel with the largest operand or the first submodel with the operand larger than the preset threshold value is operated on the special neural network processor, so that the time of data transmission between the processors can be reduced, the operation resources of the special neural network processor and the special non-neural network processor are utilized to the maximum extent, and the processing capacity of the neural network is improved comprehensively.

In addition, in the above case, the computation amount of the part of the first sub-model and the time consumption of data transmission may be further balanced, if the part of the sub-model runs on the non-neural network dedicated processor, the running speed is slower, but there is no time consumption of data transmission, and if the part of the sub-model runs on the neural network dedicated processor, the running speed is faster, but the time consumption of data transmission is additionally increased. And determining that the part of the first submodel runs on the non-neural network special processor or the neural network special processor on the basis of comprehensively considering the calculation amount and the data transmission time.

Alternatively, step S132 may be omitted in addition to the above embodiments, and the at least one first initial submodel may be used as the first type submodel, and the at least one second initial submodel may be used as the second type submodel.

And S141, compiling the two types of sub models to obtain two types of executable sub programs, wherein the two types of executable sub programs are respectively operated on the two types of processors.

This step may be an embodiment of step S140 described above.

Specifically, for the first type of sub-model, a first compiler is adopted for compiling to obtain a first type of executable sub-program which can run on the neural network dedicated processor, and for the second type of sub-model, a second compiler is adopted for compiling to obtain a second type of executable sub-program which can run on the non-neural network dedicated processor.

For example, the first type of sub-model is a sub-model suitable for running on the TPU and the second type of sub-model is a sub-model not suitable for running on the TPU. The first type of sub-model is compiled on a TPU compiler resulting in a first type of executable sub-program that can be run on the TPU. While a second type of sub-model not suitable for running on the TPU may be run in the CPU or other processor chip, e.g. the second type of sub-model may be compiled on a CPU compiler resulting in a second type of executable sub-program that may be run on the CPU.

It should be understood that the second type sub-model may be compiled on a compiler of other processor chips such as a GPU, an ASIC, or an FPGA, besides the CPU compiler, and the specific compiler type is not specifically limited in the embodiment of the present application.

FIG. 14 shows a schematic flow diagram of another neural network model processing method 100.

As shown in fig. 14, the processing method 100 of the neural network model and the like may further include the following steps.

S150: and connecting at least two types of executable subprograms in series to form the executable program of the neural network model.

Specifically, in step 130, at the same time of combining at least two types of submodels, the topological relation between the submodels and the related information of the mutual tensor between the submodels are formed. In this step, the input and output of the executable programs corresponding to the plurality of submodels are connected in series by the topological relation of the plurality of submodels and the information of the interactive tensors, so as to form the executable program of the neural network model.

In one embodiment, for example, in step S132 described above, information about the topological relationship between the at least one first submodel and the at least one second submodel and the tensor of interaction between the submodels is recorded. In this step, according to the topological relation and the related information, a connection relation between the first executable subprogram and the second executable subprogram is obtained, so that each executable subprogram in the first executable subprogram and each executable subprogram in the second executable subprogram are connected in series to form an executable program of the neural network model.

S160: an interface between the executable program and a user is established.

In particular, the input and output interfaces between the executable program of the neural network model and the user are established, i.e., the input and output interfaces are established based on a specific programming language, such as C, C + +, python, and the like. The user inputs data and commands through a specific programming language, the executable program can be directly run, the data and the commands are not required to be input into the software framework based on the software framework, and the executable program is formed through the compiler in sequence, so that the data processing process is more efficient.

S170: and acquiring data through the interface, and executing the executable program to reason the data to obtain an inference result.

Specifically, in this step, the user inputs data through the interface, and at least two types of executable sub-programs in the executable program run on at least two types of processors corresponding to the executable program respectively to perform operation reasoning on the data, thereby implementing a reasoning function of the neural network on the data.

In some embodiments, the executable program includes a first type of executable sub program and a second type of executable sub program, wherein the first type of executable sub program is executed by the neural network dedicated processor, the second type of executable sub program is executed by the non-neural network dedicated processor, and the data input by the user is directly processed by the neural network dedicated processor and the non-neural network dedicated processor.

Alternatively, the data input by the user may be any type of data such as text, voice, or image, and in this embodiment, the inference process of the neural network model on the data may be a process of performing target detection on the text, voice, or image, for example, detecting a target text, a target voice, or a target image, and obtaining a result of the target detection.

By adopting the scheme of the embodiment of the application, a user can directly input commands and data into the executable program of the neural network model based on a specific user language, at least two types of executable subprograms in the executable program run in different processors respectively, the processing capability of the neural network model is improved, meanwhile, the data inference result of the neural network model can be rapidly obtained, the operation of the user is facilitated, and the user experience is improved.

It will be appreciated that the executable program of the neural network model described above may be used to train the neural network model in addition to reasoning about data input by the user. At this time, the relevant parameters in the executable program may be preset initial parameters, the data input by the user is a data set for training, and the training result may be obtained by running the executable program, so as to continuously optimize the relevant parameters in the executable program.

A processing method of a neural network model of the present application is described in detail above with reference to fig. 4 to 14.

The processing method of the neural network comprises the steps of dividing a graph model of the neural network model, compiling the divided sub models into a plurality of executable sub programs respectively, and encapsulating the executable sub programs into one executable program.

In addition to the processing method, the present application also provides an inference method of a neural network model, in which only a final executable program is executed without including the above-described processes of division, compilation, and encapsulation. In other words, the above dividing, compiling and packaging processes are the pre-processes of the inference method, and are not embodied in the inference method.

FIG. 15 illustrates a method 200 of reasoning for a neural network model. The inference method 200 may be executed by a processing apparatus, where the processing apparatus includes at least two types of processors, and the at least two types of processors may be spatially located in the same electronic device or located in different electronic devices, and data transmission between the at least two types of processors may be through wired transmission or through wireless transmission.

As an example, the processing means may comprise the data processing device of fig. 2 and 3 described above. The processing device can comprise a CPU, an MPU/MCU, an ASIC or FPGA and other general processor modules and NPU, TPU or DPU and other neural network special processor modules.

As shown in fig. 15, the inference method 200 of the neural network model may include the following steps.

S210: data input by a user is received.

S220: and executing at least two types of executable subprograms in the executable program to reason the data to obtain an inference result.

The at least two types of executable subprograms are used for running on the at least two types of processors, the at least two types of executable subprograms are obtained by classifying and combining a plurality of nodes according to the characteristics of the plurality of nodes in a graph model of a neural network model, and the graph model of the neural network model is used for processing the data input by the user.

Alternatively, the steps S210 and S220 may be the same as the step S170 described above.

Optionally, the executable program is obtained by connecting each executable sub program of at least two types of executable sub programs in series according to the topological structure of the graph model of the neural network model.

Specifically, an interface is established between the executable program and the user. A user inputs data and commands through a specific programming language, at least two types of executable subprograms in the executable program can be directly operated, the data and the commands are not required to be input into the software framework based on the software framework, and the executable program is formed through a compiler in sequence, so that the data processing process is more efficient.

And receiving data input by a user through the interface, wherein at least two types of executable subprograms in the executable program run on at least two types of processors corresponding to the executable subprograms respectively so as to carry out operational reasoning on the data and realize the reasoning function of the neural network on the data.

Alternatively, the data input by the user may be any type of data such as text, voice, or image, and in this embodiment, the inference process on the data may be a process of performing target detection on the text, voice, or image, for example, detecting a target text, a target voice, or a target image, and obtaining a target detection result.

Specifically, at least two types of executable subroutines in the embodiment of the present application are obtained by classifying and merging a plurality of nodes according to characteristics of the plurality of nodes in the graph model of the neural network model, and the executable subroutines obtained by converting the neural network model can better perform inference of data, so as to obtain an accurate inference result. And the at least two types of executable subprograms are used for respectively running on the at least two types of processors, so that resources in the processors can be fully utilized, and the data processing performance is improved.

Through the scheme of the embodiment of the application, a user can directly input commands and data into the executable program of the neural network model based on a specific user language, at least two types of executable subprograms in the executable program run in different processors respectively, the processing capacity of the neural network model is improved, meanwhile, the data inference result of the neural network model can be rapidly obtained, the operation of the user is facilitated, and the user experience is improved.

In some embodiments, FIG. 16 illustrates another neural network model inference method 200.

As shown in fig. 16, the step S220 may include:

s221: the special neural network processor executes a first executable subprogram of the at least two executable subprograms, and the special non-neural network processor executes a second executable subprogram of the at least two executable subprograms so as to reason the data and obtain an inference result.

Specifically, in the embodiment of the present application, the first type of executable sub program and the second type of executable sub program may refer to the related description in the processing method 100.

In some embodiments, the first type of executable sub-program is compiled from a first type of sub-model in a graph model of the neural network model, and the second type of executable sub-program is compiled from a second type of sub-model in the graph model of the neural network model;

the first-type submodel and the second-type submodel are obtained by respectively merging two types of nodes in a plurality of nodes of the neural network model, wherein the first type of nodes in the two types of nodes are nodes suitable for running on a special processor of the neural network, and the second type of nodes are nodes not suitable for running on the special processor of the neural network.

Similarly, see the description of the processing method 100 above for two types of nodes and two types of submodels.

In a possible embodiment, the first-type submodel is obtained by combining the first-type nodes, and the second-type submodel is obtained by combining the second-type nodes.

In another possible implementation, the first-type submodel is a first target submodel with the largest calculated amount in at least one first submodel formed after the first-type nodes are combined; the second type submodel comprises other first submodels except the first target submodel with the largest calculated amount in at least one first submodel and at least one second submodel formed after the second type nodes are combined.

In a third possible implementation manner, the first-type submodel includes a first target submodel, the calculated amount of which is greater than a preset threshold value, in at least one first submodel formed after the first-type nodes are combined; the second type submodel comprises other first submodels except for a first target submodel with the calculated amount larger than a preset threshold value in at least one first submodel and at least one second submodel formed after the second type nodes are combined.

In a fourth possible implementation manner, the first-type submodel includes a first target submodel with the largest calculation amount and a second target submodel with the calculation amount larger than a preset threshold value, which are formed in at least one first submodel after the first-type nodes are combined; the second type submodel comprises at least one first submodel except the first target submodel and the second target submodel and at least one second submodel formed after the second type nodes are combined.

Optionally, in the above embodiment, the computation amount of each of the at least one first submodel is proportional to the number of target nodes in each of the at least one first submodel, where the target nodes include operators of computation types.

The method embodiments of the neural network model processing method and the inference method of the present application are described in detail above with reference to fig. 4 to 16, and the device embodiments of the neural network model processing device and the inference device of the present application are described in detail below with reference to fig. 17 to 18, it being understood that the device embodiments correspond to the method embodiments, and similar descriptions may refer to the method embodiments.

Fig. 17 shows a schematic block diagram of a processing device 10 of a neural network model. The processing device 10 may be used to perform the neural network model processing method 100 described above.

As shown in fig. 17, the processing device 10 of the neural network model includes: a first acquisition unit 11 and a first processing unit 12.

Specifically, the first obtaining unit 11 is configured to obtain a graph model of a neural network model, where the graph model includes a plurality of nodes, and each node in the plurality of nodes includes an operator;

the first processing unit 12 is configured to divide the plurality of nodes into at least two classes according to characteristics of the plurality of nodes; combining the nodes according to the division result to form at least two types of sub-models; compiling the at least two types of sub models to obtain at least two types of executable sub programs, wherein the at least two types of executable sub programs are used for running on the at least two types of processors.

In a possible embodiment, the first processing unit 12 is configured to: according to the characteristics of the nodes, the nodes are divided into two types, wherein one type is the node suitable for running on the special neural network processor, and the other type is the node not suitable for running on the special neural network processor.

In a possible embodiment, the first processing unit 12 is configured to: combining nodes of the plurality of nodes that are adapted to run on the neural network dedicated processor to form at least one first submodel; combining nodes of the plurality of nodes that are not suitable for running on the neural network dedicated processor to form at least one second submodel; and dividing the at least one first submodel and the at least one second submodel into two types of submodels.

In a possible embodiment, the first processing unit 12 is configured to: and taking at least one first sub-model as a first sub-model of the two types of sub-models, and taking at least one second sub-model as a second sub-model of the two types of sub-models.

In a possible embodiment, the first processing unit 12 is configured to: the calculation amount of each first submodel in the at least one first submodel is calculated, the first target submodel with the largest calculation amount is divided into a first type submodel in the two types of submodels, the first submodels except the first target submodel in the at least one first submodel and the at least one second submodel are divided into a second type submodel in the two types of submodels.

In a possible embodiment, the first processing unit 12 is configured to: calculating the calculated amount of each first submodel in at least one first submodel, dividing a first target submodel of which the calculated amount is larger than a preset threshold value into a first submodel of two types, and dividing other first submodels except the first target submodel and at least one second submodel into a second submodel of the two types.

In a possible embodiment, the first processing unit 12 is configured to: the method comprises the steps of calculating the calculated amount of each first submodel in at least one first submodel, dividing a first target submodel with the largest calculated amount and a second target submodel with the calculated amount larger than a preset threshold value into first submodels in two types, dividing other first submodels except the first target submodel and the second target submodel in at least one first submodel and dividing at least one second submodel into second submodels in the two types.

In a possible embodiment, the first processing unit 12 is configured to: and calculating the calculated amount of each first submodel according to the number of target nodes in each first submodel, wherein the target nodes comprise operators of calculation types.

In a possible embodiment, the first processing unit 12 is configured to: compiling the first type sub-model to form a first type executable sub-program, wherein the first type executable sub-program is used for running on a special processor of the neural network; and compiling the second type of sub-model to form a second type of executable sub-program, wherein the second type of executable sub-program is used for running on the non-neural network special processor.

In a possible embodiment, the first processing unit 12 is configured to: setting different marks on a plurality of nodes to distinguish different categories; and combining the nodes according to the marks on the nodes to form at least two types of sub-models.

In a possible embodiment, the first processing unit 12 is configured to: setting a first flag on a node of the plurality of nodes that is adapted to run on the neural network dedicated processor, and setting a second flag on a node of the plurality of nodes that is not adapted to run on the neural network dedicated processor; combining adjacent nodes with first marks in the plurality of nodes to form at least one first initial sub-model; merging adjacent nodes with second marks in the plurality of nodes to form at least one second initial submodel; carrying out merging judgment on at least one first initial submodel and at least one second initial submodel to form at least one first submodel and at least one second submodel; and dividing the at least one first submodel and the at least one second submodel into two types of submodels.

In a possible implementation manner, the neighboring nodes with the first flag include a first parent node and a first child node, and the first processing unit 12 is configured to: if the first father node and the first child node form a ring structure, merging the first father node and the first child node; if the first father node and the first child node do not form a ring structure, merging the first father node and the first child node; if the first father node, the first child node and other nodes form a ring structure together, the first father node, the first child node and other nodes are combined in sequence according to a preset rule.

In one possible embodiment, no ring structure is formed in each of the at least one first initial submodel; a ring structure is not formed in each of the at least one second initial submodel.

In a possible embodiment, the first processing unit 12 is configured to: if the output node of a first specific initial submodel in at least one first initial submodel is a specific node, combining the first specific initial submodel with a second specific initial submodel, wherein the second specific initial submodel is a second initial submodel in which the child node of the output node of the first specific initial submodel is positioned, and taking the combined first specific submodel and the second specific submodel as a second submodel;

if the output node of a second specific initial submodel in at least one second initial submodel is a specific node, combining the second specific initial submodel with the first specific initial submodel, wherein the first specific initial submodel is a first initial submodel in which the output node of the second specific initial submodel is positioned, and taking the combined first specific submodel and the second specific submodel as one second submodel.

In a possible embodiment, the first processing unit 12 is further configured to: if the output node of one first initial submodel in at least one first initial submodel is a non-specific node, taking one first initial submodel as one first submodel; and if the output node of one second initial submodel in the at least one second initial submodel is a non-specific node, taking the second initial submodel as a second submodel.

Optionally, as shown in fig. 17, the processing apparatus 10 further includes: a second acquisition unit 13 and a second processing unit 14;

the first processing unit 12 is further configured to: connecting at least two types of executable subprograms in series to form an executable program of the neural network model; and establishing an interface between the executable program and the user;

the second obtaining unit 13 is configured to: acquiring data through an interface;

the first processing unit 12 is further configured to execute one of the at least two types of executable sub-programs, and the second processing unit 14 is configured to execute another of the at least two types of executable sub-programs to perform inference on the data to obtain an inference result.

In one possible embodiment, the second processing unit 14 is a processing unit on a neural network dedicated processor and the first processing unit 12 is a processing unit on a non-neural network dedicated processor.

In a possible embodiment, the data is image, voice or text data, and the first processing unit 12 and the second processing unit 14 are configured to perform target detection on the image, voice or text data to obtain a detection result.

Fig. 18 shows a schematic block diagram of an inference apparatus 20 of a neural network model. The inference engine 20 may be used to perform the inference method 200 of the neural network model described above.

As shown in fig. 18, the inference device 20 of the neural network model includes: an acquisition unit 21 and at least two processing units, such as a first processing unit 22 and a second processing unit 23 in fig. 18.

Specifically, the acquiring unit 21 is configured to acquire data input by a user;

the system comprises at least two processing units, a data processing unit and a data processing unit, wherein the at least two processing units are used for respectively executing at least two types of executable subprograms in the executable programs so as to carry out reasoning on data and obtain a reasoning result;

the at least two processing units are processing units in at least two types of processors, the at least two types of executable subprograms are obtained by classifying and combining a plurality of nodes according to the characteristics of the plurality of nodes in a graph model of the neural network model, and the graph model of the neural network model is used for processing data.

As an example, as shown in fig. 18, both the first processing unit 22 and the second processing unit 23 can receive data transmitted by the obtaining unit 21, and besides this way, only one of the first processing unit 22 and the second processing unit 23 may receive data transmitted by the obtaining unit 21, and then the received and processed data is transmitted to another processing unit by the processing unit, and in some embodiments, the processing unit may be a general-purpose processor, such as a CPU, for receiving data transmitted by the obtaining unit 21, and then processing the data and transmitting the processed data to other types of processors.

In one possible embodiment, the first processing unit 22 of the at least two processing units is a processing unit of a neural network dedicated processor, the first processing unit 22 being configured to execute a first executable sub-program of the at least two types of executable sub-programs; a second processing unit 23 of the at least two processing units is a processing unit of the non-neural network dedicated processor, the second processing unit being configured to execute a second executable sub-program of the at least two executable sub-programs.

In a possible implementation manner, the first type of executable sub-program is compiled according to a first type of sub-model in the graph model, and the second type of executable sub-program is compiled according to a second type of sub-model in the graph model; the first type submodel and the second type submodel are obtained by respectively merging two types of nodes in the plurality of nodes, wherein the first type of nodes in the two types of nodes are nodes suitable for running on the special processor of the neural network, and the second type of nodes in the two types of nodes are nodes not suitable for running on the special processor of the neural network.

In a possible implementation manner, the first-type sub-model is obtained by combining the first-type nodes; the second type submodel is obtained by combining the second type nodes.

In a possible implementation manner, the first-type submodel is a first submodel with the largest calculated amount in at least one first submodel formed after the first-type nodes are combined; the second type submodel comprises other first submodels except the first submodel with the largest calculated amount in at least one first submodel and at least one second submodel formed after the second type nodes are combined.

In a possible implementation manner, the first-type submodel includes a first submodel, in which the calculated amount is greater than a preset threshold value, in at least one first submodel formed after the first-type nodes are combined; the second type submodel comprises other first submodels except the first submodel with the calculated amount larger than the preset threshold value in at least one first submodel and at least one second submodel formed after the second type nodes are combined.

In a possible implementation manner, the first-type submodel includes a first target submodel with the largest calculated amount in at least one first submodel formed after the first-type nodes are combined, and a second target submodel with the calculated amount larger than a preset threshold; the second type submodel comprises at least one first submodel except the first target submodel and the second target submodel and at least one second submodel formed after the second type nodes are combined.

In one possible embodiment, the computation amount of each of the at least one first submodel is proportional to the number of target nodes in each of the first submodels, wherein the target nodes include operators of the computation types.

In one possible embodiment, the executable program is obtained by connecting each executable sub program in at least two types of executable sub programs in series according to the topological structure of the graph model.

In a possible embodiment, the data is image, voice or text data, and the at least two processing units are configured to: and carrying out target detection on the image, voice or character data to obtain a detection result.

The application also provides an electronic device, which comprises a processor and a memory, wherein the memory is used for storing the program codes, and the processor is used for calling the program codes and executing the method of any method embodiment in the application.

The present application also provides a computer readable storage medium for storing program code which, when executed by a processor, implements a method as described in any of the method embodiments of the present application. The program code may be a high-level language program or an executable object program.

The computer-readable storage medium is, for example, a memory. The memory may be volatile memory or nonvolatile memory, or the memory may include both volatile and nonvolatile memory.

It should be noted that, without conflict, the embodiments and/or technical features in the embodiments described in the present application may be arbitrarily combined with each other, and the technical solutions obtained after the combination also fall within the protection scope of the present application.

It should be understood that the specific examples in the embodiments of the present application are for the purpose of promoting a better understanding of the embodiments of the present application and are not intended to limit the scope of the embodiments of the present application.

It should also be understood that, in the various embodiments of the present application, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the inherent logic of the processes, and should not constitute any limitation to the implementation process of the embodiments of the present application.

It is also to be understood that the terminology used in the embodiments of the present application and the appended claims is for the purpose of describing particular embodiments only, and is not intended to be limiting of the embodiments of the present application. For example, as used in the examples of this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

Unless otherwise defined, all technical and scientific terms used in the examples of this application have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware or any other combination. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., Digital Video Disk (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

40页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种农场作物施肥的方法、系统、装置及存储介质

Processing method and reasoning method of neural network model, device thereof and electronic equipment

相关技术

网友询问留言