Training method and device of neural network model

文档序号:1363420 发布日期:2020-08-11 浏览:12次 中文

阅读说明:本技术 神经网络模型的训练方法和装置 (Training method and device of neural network model ) 是由 希滕 张刚 温圣召 于 2020-04-09 设计创作,主要内容包括:本公开涉及人工智能领域。本公开的实施例公开了神经网络模型的训练方法和装置。该方法包括迭代执行多次训练操作;训练操作包括:对第一神经网络模型进行剪枝得到第二神经网络模型;采用第一神经网络模型对媒体数据进行特征提取得到第一特征,采用第二神经网络模型对媒体数据进行特征提取得到第二特征;获取第一神经网络模型对媒体数据的处理结果;基于预先构建的监督函数和任务损失函数,确定第一神经网络模型的误差,通过反向传播误差迭代调整第一神经网络模型的参数;监督函数表征第一特征和第二特征之间的差异,任务损失函数表征第一神经网络模型对媒体数据的处理结果的误差。该方法可以训练出剪枝后模型性能较好的神经网络模型。(The present disclosure relates to the field of artificial intelligence. The embodiment of the disclosure discloses a training method and a training device of a neural network model. The method includes iteratively performing a plurality of training operations; the training operation comprises: pruning the first neural network model to obtain a second neural network model; adopting a first neural network model to perform feature extraction on the media data to obtain first features, and adopting a second neural network model to perform feature extraction on the media data to obtain second features; acquiring a processing result of the first neural network model on the media data; determining the error of the first neural network model based on a pre-constructed supervision function and a task loss function, and iteratively adjusting the parameters of the first neural network model through back propagation errors; the supervisory function characterizes a difference between the first feature and the second feature, and the task loss function characterizes an error of a result of processing of the media data by the first neural network model. The method can train the neural network model with better model performance after pruning.)

1. A training method of a neural network model comprises the steps of iteratively executing a plurality of training operations; the training operation comprises:

pruning the first neural network model to obtain a second neural network model;

adopting a first neural network model to perform feature extraction on the media data to obtain first features, and adopting a second neural network model to perform feature extraction on the media data to obtain second features;

acquiring a processing result of the first neural network model on the media data based on the first characteristic;

determining an error of the first neural network model based on a pre-constructed supervision function and a task loss function, and iteratively adjusting parameters of the first neural network model by back-propagating the error;

wherein the supervisory function characterizes a difference between the first feature and the second feature, and the task loss function characterizes an error of a result of processing of the media data by the first neural network model.

2. The method of claim 1, wherein the first neural network model comprises a first feature extraction layer and a first classifier, the first features comprising features output by a last network layer of the first feature extraction layer connected to the first classifier;

the second neural network model includes a second feature extraction layer and a second classifier, the second features including features output by an optimal one of the second feature extraction layer connected to the second classifier.

3. The method of claim 2, wherein the first features further comprise features output by a first intermediate layer in the first feature extraction layer;

the second features further include features output by a second intermediate layer in the second feature extraction layer;

the difference between the first feature and the second feature comprises: a difference between a feature of the first mid-layer output and a feature of the second mid-layer output in the second neural network model corresponding to the first mid-layer, and a difference between a feature of a last network layer output of the first classifier connection and a feature of a last network layer output of the second classifier connection.

4. The method of any of claims 1-3, wherein the training operation further comprises:

determining that the first neural network model completes training in response to determining that training operation reaches a preset convergence condition; and

the method further comprises the following steps:

and pruning the trained first neural network model to obtain a pruned neural network model.

5. The method of claim 4, wherein the method further comprises:

and processing the media data to be processed by adopting the neural network model after pruning.

6. A training device of a neural network model comprises a training unit, a training unit and a training unit, wherein the training unit is configured to iteratively execute a plurality of training operations;

the training unit includes:

the first pruning unit is configured to prune the first neural network model in each training operation to obtain a second neural network model;

the extraction unit is configured to perform feature extraction on the media data by adopting a first neural network model to obtain a first feature and perform feature extraction on the media data by adopting a second neural network model to obtain a second feature in each training operation;

an acquisition unit configured to acquire a processing result of the first neural network model on the media data based on the first feature in each training operation;

an updating unit configured to determine an error of the first neural network model based on a pre-constructed supervision function and a task loss function in each training operation, iteratively adjust parameters of the first neural network model by back-propagating the error;

wherein the supervisory function characterizes a difference between the first feature and the second feature, and the task loss function characterizes an error of a result of processing of the media data by the first neural network model.

7. The apparatus of claim 6, wherein the first neural network model comprises a first feature extraction layer and a first classifier, the first features comprising features output by a last network layer of the first feature extraction layer connected to the first classifier;

the second neural network model includes a second feature extraction layer and a second classifier, the second features including features output by an optimal one of the second feature extraction layer connected to the second classifier.

8. The apparatus of claim 7, wherein the first features further comprise features output by a first intermediate layer in the first feature extraction layer;

the second features further include features output by a second intermediate layer in the second feature extraction layer;

the difference between the first feature and the second feature comprises: a difference between a feature of the first mid-layer output and a feature of the second mid-layer output in the second neural network model corresponding to the first mid-layer, and a difference between a feature of a last network layer output of the first classifier connection and a feature of a last network layer output of the second classifier connection.

9. The apparatus of any of claims 6-8, wherein the training unit further comprises:

a determining unit configured to determine that the first neural network model completes training in response to determining that a training operation reaches a preset convergence condition in each training operation; and

the device further comprises:

and the second pruning unit is configured to prune the trained first neural network model to obtain a pruned neural network model.

10. The apparatus of claim 9, wherein the apparatus further comprises:

and the processing unit is configured to process the media data to be processed by adopting the pruned neural network model.

11. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-5.

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to the technical field of artificial intelligence, and particularly relates to a training method and device of a neural network model.

Background

Pruning of a neural network is a technique in which redundant parameters are deleted to increase the operation speed of the neural network. The current pruning method is to cut the channel of the model according to a certain rule after the neural network training is finished. However, the relationship between the parameters has a direct influence on the performance of the model, and even for the parameters with very small weights, the clipping may have a great influence on the accuracy of the model. Therefore, in order to ensure the performance of the neural network after pruning, the pruning efficiency is very low.

Disclosure of Invention

Embodiments of the present disclosure provide a training method and apparatus of a neural network model, an electronic device, and a computer-readable medium.

In a first aspect, an embodiment of the present disclosure provides a training method for a neural network model, including iteratively performing a plurality of training operations; the training operation comprises: pruning the first neural network model to obtain a second neural network model; adopting a first neural network model to perform feature extraction on the media data to obtain first features, and adopting a second neural network model to perform feature extraction on the media data to obtain second features; acquiring a processing result of the first neural network model on the media data based on the first characteristic; determining the error of the first neural network model based on a pre-constructed supervision function and a task loss function, and iteratively adjusting the parameters of the first neural network model through back propagation errors; wherein the supervision function characterizes a difference between the first feature and the second feature, and the task loss function characterizes an error of a result of the processing of the media data by the first neural network model.

In some embodiments, the first neural network model comprises a first feature extraction layer and a first classifier, the first features comprising features output by a last network layer of the first feature extraction layer connected to the first classifier; the second neural network model comprises a second feature extraction layer and a second classifier, and the second features comprise features output by the optimal one of the second feature extraction layer and the second classifier.

In some embodiments, the first feature further comprises a feature output by a first intermediate layer in the first feature extraction layer; the second features further include features output by a second intermediate layer in a second feature extraction layer; the difference between the first feature and the second feature comprises: the difference between the features of the first mid-layer output and the features of the second mid-layer output in the second neural network model corresponding to the first mid-layer, and the difference between the features of the last network layer output of the first classifier connection and the features of the last network layer output of the second classifier connection.

In some embodiments, the training operation further comprises: determining that the first neural network model completes training in response to determining that the training operation reaches a preset convergence condition; and the above method further comprises: and pruning the trained first neural network model to obtain a pruned neural network model.

In some embodiments, the above method further comprises: and processing the media data to be processed by adopting the neural network model after pruning.

In a second aspect, an embodiment of the present disclosure provides an apparatus for training a neural network model, including a training unit configured to iteratively perform a plurality of training operations. The training unit comprises: the first pruning unit is configured to prune the first neural network model in each training operation to obtain a second neural network model; the extraction unit is configured to perform feature extraction on the media data by adopting a first neural network model to obtain a first feature and perform feature extraction on the media data by adopting a second neural network model to obtain a second feature in each training operation; an acquisition unit configured to acquire a processing result of the first neural network model on the media data based on the first feature in each training operation; the updating unit is configured to determine an error of the first neural network model based on a pre-constructed supervision function and a task loss function in each training operation, and iteratively adjust parameters of the first neural network model through back propagation of the error; wherein the supervision function characterizes a difference between the first feature and the second feature, and the task loss function characterizes an error of a result of the processing of the media data by the first neural network model.

In some embodiments, the first neural network model comprises a first feature extraction layer and a first classifier, the first features comprising features output by a last network layer of the first feature extraction layer connected to the first classifier; the second neural network model includes a second feature extraction layer and a second classifier, and the second features include features output by an optimal one of the second feature extraction layer connected to the second classifier.

In some embodiments, the first feature further comprises a feature output by a first intermediate layer in the first feature extraction layer; the second features further include features output by a second intermediate layer in a second feature extraction layer; the difference between the first and second features described above includes: the difference between the features of the first mid-layer output and the features of the second mid-layer output in the second neural network model corresponding to the first mid-layer, and the difference between the features of the last network layer output of the first classifier connection and the features of the last network layer output of the second classifier connection.

In some embodiments, the training unit further comprises: a determining unit configured to determine that the first neural network model completes training in response to determining that the training operation reaches a preset convergence condition in each training operation; and the above apparatus further comprises: and the second pruning unit is configured to prune the trained first neural network model to obtain a pruned neural network model.

In some embodiments, the above apparatus further comprises: and the processing unit is configured to process the media data to be processed by adopting the pruned neural network model.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement the method of training a neural network model as provided in the first aspect.

In a fourth aspect, an embodiment of the present disclosure provides a computer readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the training method of the neural network model provided in the first aspect.

The training method and the device of the neural network model of the above embodiment of the present disclosure include iteratively performing a plurality of training operations; the training operation comprises: pruning the first neural network model to obtain a second neural network model; adopting a first neural network model to perform feature extraction on the media data to obtain first features, and adopting a second neural network model to perform feature extraction on the media data to obtain second features; acquiring a processing result of the first neural network model on the media data based on the first characteristic; determining the error of the first neural network model based on a pre-constructed supervision function and a task loss function, and iteratively adjusting the parameters of the first neural network model through back propagation errors; wherein the supervision function characterizes a difference between the first feature and the second feature, and the task loss function characterizes an error of a result of the processing of the media data by the first neural network model. According to the method and the device, the performance of the pruned neural network model is used for supervision in the training of the neural network model, the dependence of the parameters reserved in the pruning process on the cut parameters is minimized, and the neural network model which can be pruned quickly and has good performance after pruning can be trained.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which embodiments of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method of training a neural network model according to the present disclosure;

FIG. 3 is a flow diagram of another embodiment of a method of training a neural network model according to the present disclosure;

FIG. 4 is a schematic structural diagram of an embodiment of a training apparatus for a neural network model of the present disclosure;

FIG. 5 is a schematic block diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which the neural network model training method or the neural network model training apparatus of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The terminal devices 101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may be devices on the user 110 side, on which various client applications may be installed. Such as image processing-type applications, information analysis-type applications, voice assistant-type applications, shopping-type applications, financial-type applications, and the like.

The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server running various services, such as a server running services of object detection and recognition based on data such as image, video, voice, text, digital signals, text or voice recognition, signal conversion, and the like. The server 105 may acquire deep learning task data from the terminal devices 101, 102, 103 or from a database to construct training samples to train a neural network model for performing a deep learning task. The server 105 may also prune the trained neural network model to reduce the complexity of the neural network model, so that the pruned neural network model may be deployed on the terminal devices 101, 102, 103 to provide services based on the neural network model to the user 110 in real time.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

The trained neural network model may be deployed and run on the terminal devices 101, 102, 103. Generally, the terminal devices 101, 102, 103 expect the model structure to be simple and less calculation amount, so as to meet the real-time requirement of interaction with the user. In the scenario of the embodiment of the present disclosure, the server 105 may prune the neural network model during the training process of the neural network model according to hardware or software constraints (such as delay, power consumption of the processor, and computational efficiency in an application program running environment) of the terminal devices 101, 102, and 103, and supervise the training of the neural network model by using the pruning result.

Alternatively, in some scenarios, the terminal devices 101, 102, 103 may also perform training operations of the neural network model, and supervise the training of the neural network model based on pruning results of the neural network model.

The training method of the neural network model provided by the embodiment of the present disclosure may be executed by the terminal device 101, 102, 103 or the server 105, and accordingly, the training apparatus of the neural network model may be disposed in the terminal device 101, 102, 103 or the server 105.

In some scenarios, the terminal device 101, 102, 103 or the server 105 may locally read or obtain source data required for model training from a database or the like, for example, locally read the neural network model to be trained and media data for training. At this point, the exemplary system architecture 100 may not include the network 104 and the server 105, or the terminal devices 101, 102, 103 and the network 104.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method of training a neural network model in accordance with the present disclosure is shown. The training method of the neural network model comprises the step of iteratively executing a plurality of training operations. Wherein the training operation comprises the following steps 201 to 204:

step 201, pruning is carried out on the first neural network model to obtain a second neural network model.

In this embodiment, the executing subject of the training method of the neural network model may first obtain the first neural network model. The first neural network model is the model to be trained, whose parameters may be randomly initialized. Alternatively, in some alternative implementations, the first neural network model may be a pre-trained model, and the parameters thereof are parameters obtained after pre-training.

The first neural network model may be pruned, and some channels (channels) therein may be clipped to obtain a simplified network model as a second neural network model. In particular, channels with lower importance in the first neural network model may be clipped in a back propagation manner, for example, some neuron structures or weight parameters that have less influence on the model performance may be clipped. In a specific implementation manner, an optimal combination can be selected from all weight parameters of the neural network model, the parameters in the optimal combination are reserved, and the cost function loss of the pruned model obtained after the remaining parameters are pruned is minimum.

In each training operation, the first neural network model is updated and then pruning operation is carried out to obtain a corresponding second neural network model. Since the parameters of the first neural network model are updated in each iteration, the second neural network obtained after pruning is also updated.

Step 202, performing feature extraction on the media data by adopting a first neural network model to obtain a first feature, and performing feature extraction on the media data by adopting a second neural network model to obtain a second feature.

In this embodiment, a first neural network model is used to process media data. The media data may be images, video, audio, text, etc. data used to convey content. The deep learning task performed by the first neural network model may be a classification task or a regression task. When the deep learning task is executed, the first neural network model may first perform feature extraction on the media data to obtain a first feature of the media data. The first neural network model may then complete a classification or regression task based on the extracted first features of the media data.

The first neural network model may be a convolutional neural network, a recurrent neural network, or the like. As an example, a convolutional neural network comprises a plurality of convolutional layers, or in some convolutional neural networks comprises a plurality of residual modules, each residual module may comprise a number of repeating units consisting of convolutional layers, batch normalization layers. Each convolutional layer or residual module may extract features of different scales for the media data, respectively. In this embodiment, the feature of the media data extracted by each convolutional layer may be used as the first feature, or the feature output by the last convolutional layer or the last residual module including the convolutional layer may be used as the first feature.

Accordingly, a second neural network model may be employed for feature extraction of the media data. As an example, when the first neural network model is a convolutional neural network model, the pruned second neural network model is also a convolutional neural network. The features output by the corresponding convolutional layer or residual module in the second neural network model and in the first neural network model may be taken as the second features of the media data.

Here, the media data is training data. A set of media data for training the first neural network model may be pre-constructed, and the media data in the set of media data may contain annotation information. For example, image data and video data contain annotation information of object types or object positions in images, voice data contain corresponding text annotation information, and one piece of text data contains annotation information of corresponding translated texts in another language.

Step 203, obtaining a processing result of the first neural network model to the media data based on the first characteristic.

The result of the processing of the media data by the first neural network model may be a result of a classification or a result of a regression of the media data. In this embodiment, after the first neural network model performs feature extraction on the media data, a classification or regression task may be performed on the media data according to the extracted first feature. The executing body can obtain the processing result of the media data output by the first neural network model.

And 204, determining the error of the first neural network model based on the pre-constructed supervision function and the task loss function, and iteratively adjusting the parameters of the first neural network model through back propagation of the error.

Wherein the supervisory function characterizes a difference between the first characteristic and the second characteristic.

A supervision function may be constructed based on the difference of the first and second features, e.g. a two-norm of the first and second features may be calculated as the supervision function. The supervision function is used to supervise parameter iterations of the first neural network model. Since the first and second features are a function of parameters of the first neural network model, the value of the supervisory function also varies with parameters of the first neural network model, i.e. the supervisory function is a function of parameters of the first neural network model.

The task loss function characterizes an error of a result of processing of the media data by the first neural network model. Here, the error of the processing result of the media data by the first neural network model may be a difference between the processing result of the media data by the first neural network model and the annotation information of the media data. The task loss function is also used to supervise parameter iterations of the first neural network model. The mission loss function is also a function of parameters of the first neural network model.

In this embodiment, the parameter iteration of the first neural network model may be jointly supervised based on the above-mentioned supervision function and task loss function. In particular, a joint loss function may be constructed based on the supervision function and the task loss function, for example, a weighted sum of the two may be used as the joint cost function. And then, calculating the gradient of the combined cost function relative to the parameters of the first neural network model by adopting a back propagation method based on the combined cost function, thereby updating the parameters of the first neural network model.

After the parameters of the first neural network model are updated, the next training operation may be performed, returning to step 201. In this way, after performing a plurality of training operations, the parameters of the first neural network model are updated for a plurality of iterations under the supervision of the supervision function and the task loss function.

In the training method of the neural network model according to the embodiment, the difference between the features extracted from the neural network model before and after pruning is used for supervision in the training process of the neural network model, so that the influence of the trimmed parameters on the performance of the neural network model is small, and the dependence of the parameters reserved in the trimmed model on the trimmed parameters is weak, so that the neural network model which can keep good performance after pruning can be obtained through training, and the pruning of the neural network model can be quickly completed after the training is completed.

Optionally, the first neural network model includes a first feature extraction layer and a first classifier, and the first feature includes a feature output by a last network layer connected to the first classifier in the first feature extraction layer. For example, the first neural network model is a convolutional neural network model, wherein the first feature extraction layer includes a plurality of convolutional layers or a plurality of residual modules, and the first classifier may include a fully-connected layer and a non-linear layer. The feature output from the last convolutional layer or the last residual module connected to the first classifier can be used as the extracted first feature.

The second neural network model comprises a second feature extraction layer and a second classifier, and the second features comprise features output by the optimal one of the second feature extraction layer and the second classifier. For example, the second neural network model is a convolutional neural network, and the second feature is a feature output by a last convolutional layer or a last residual module connected to the second classifier in the convolutional neural network.

The characteristics output by the last characteristic extraction layer in the first neural network model and the second neural network model are respectively used as the first characteristics and the second characteristics, and the constructed supervision function can more accurately represent the performance difference of the first neural network model and the second neural network model, so that the influence of the parameters cut off in the pruning operation on the reserved parameters is weakened when the parameters are iteratively adjusted by the first neural network model, the sensitivity of the trained first neural network model on the pruning operation is further reduced, and the first neural network model more suitable for pruning is trained.

Optionally, the first feature may further include a feature output by a first intermediate layer in the first feature extraction layer; the second feature further includes a feature output by the second intermediate layer in the second feature extraction layer. At this time, the difference between the first feature and the second feature includes: the difference between the features of the first mid-layer output and the features of the second mid-layer output in the second neural network model corresponding to the first mid-layer, and the difference between the features of the last network layer output of the first classifier connection and the features of the last network layer output of the second classifier connection.

The first feature extraction layer in the first neural network model and the second feature extraction layer in the second neural network model each include a plurality of intermediate layers, for example, the convolutional neural network includes a plurality of convolutional layers, and each convolutional layer extracts features of different scales. The features extracted from the corresponding layers in the first neural network model and the second neural network model may be compared, and then the differences between the features extracted from the corresponding layers are summed, or weighted and summed, to obtain the total difference between the first feature and the second feature.

In this way, a supervision function can be constructed based on the difference of the first feature and the second feature in multiple scales, so that the performance difference of multiple middle layers of the first neural network model and the second neural network is used for supervising the parameter updating of the first neural network model, and the accuracy of the trained first neural network model after pruning is higher.

With continued reference to FIG. 3, a flow diagram of another embodiment of a method of the present disclosure for training a neural network model is shown. As shown in fig. 3, a flow 300 of the method for training a neural network model of the present embodiment includes the following steps:

in step 301, a plurality of training operations are performed iteratively.

The training operation includes the following steps 3011 to 3015.

In step 3011, the first neural network model is pruned to obtain a second neural network model.

In step 3012, a first neural network model is used to perform feature extraction on the media data to obtain a first feature, and a second neural network model is used to perform feature extraction on the media data to obtain a second feature.

In step 3013, obtaining a processing result of the first neural network model on the media data based on the first feature;

in step 3014, an error of the first neural network model is determined based on the pre-constructed supervision function and the task loss function, and parameters of the first neural network model are iteratively adjusted by back-propagating the error.

Wherein the supervision function characterizes a difference between the first feature and the second feature, and the task loss function characterizes an error of a result of the processing of the media data by the first neural network model.

Steps 3011 to 3014 correspond to steps 201 to 2044 in the foregoing embodiment one by one, and the specific implementation manners of steps 3011 to 3014 can be divided into multiple descriptions referring to the corresponding descriptions of steps 201 to 204 in the foregoing embodiment, which are not described herein again.

In this embodiment, the training operation further includes:

step 3015, in response to determining that the training operation reaches the preset convergence condition, determining that the first neural network model completes training.

The preset convergence condition may be a preset training stopping condition, and may include, but is not limited to, at least one of the following: the number of times of training operation reaches a preset number threshold, the error of the first neural network model in the current training operation is smaller than the preset threshold, the parameter updating rate of the first neural network model in the latest training operation is smaller than the preset updating rate threshold, and the value of a joint cost function constructed by the supervision function and the task loss function is smaller than the preset loss value.

In each training operation, after the parameters of the first neural network model are updated, it can be determined whether the training operation satisfies the preset convergence condition. If so, stopping executing the training operation, and the current first neural network model is the trained first neural network model.

In this embodiment, the method for training a neural network model further includes:

and step 302, pruning the trained first neural network model to obtain a pruned neural network model.

In this embodiment, the trained first neural network model may be pruned according to hardware or software constraints of the device running the pruned neural network model. When pruning operation is executed, a pruning cost function can be constructed according to hardware or software constraints of the equipment, or the pruning cost function can be constructed based on loss of the model performance after pruning, an optimal pruning strategy is searched by minimizing the pruning cost function, and the trained first neural network model is pruned based on the searched optimal pruning strategy to obtain the pruned neural network model.

In the existing neural network model pruning method, after the neural network model is pruned, in order to ensure the performance of the pruned model, the pruned model needs to be retrained. The pruned neural network model obtained by the method has good performance, effectively reduces the computing resources consumed by retraining the pruned model, and can improve the pruning efficiency of the model, thereby being capable of efficiently completing the compression of the neural network model at low cost.

Optionally, the method flow 300 may further include:

and step 303, classifying the media data to be classified by adopting the pruned neural network model.

The pruned neural network model may be deployed in the execution main body or in a terminal device communicatively connected to the execution main body. When the media data to be processed is obtained, the media data to be processed can be input to the pruned neural network model for processing, and a processing result is obtained. The media data to be processed is data of unknown processing results, such as images to be classified, audio to be recognized, text to be translated, and the like.

Because the operation amount of the pruned neural network model is small, the calculation resources consumed by processing the media data to be processed are less, so that the processing result can be quickly provided, and the method can be applied to scenes with high real-time requirements.

Referring to fig. 4, as an implementation of the method for training the neural network model, the present disclosure provides an embodiment of a device for training a neural network model, where the device embodiment corresponds to the method embodiments shown in fig. 2 and fig. 3, and the device may be applied to various electronic devices.

As shown in fig. 4, the training apparatus 400 of the neural network model of the present embodiment includes a training unit 401. The training unit 401 is configured to iteratively perform a plurality of training operations. The training unit 401 includes: the first pruning unit 4011 is configured to prune the first neural network model in each training operation to obtain a second neural network model; the extraction unit 4012 is configured to, in each training operation, perform feature extraction on the media data by using the first neural network model to obtain a first feature, and perform feature extraction on the media data by using the second neural network model to obtain a second feature; an obtaining unit 4013 configured to obtain, in each training operation, a processing result of the first neural network model on the media data based on the first feature; and an updating unit 4014 configured to determine an error of the first neural network model based on a pre-constructed supervision function and a task loss function in each training operation, iteratively adjust parameters of the first neural network model by back-propagating the error; wherein the supervision function characterizes a difference between the first feature and the second feature, and the task loss function characterizes an error of a result of the processing of the media data by the first neural network model.

In some embodiments, the first neural network model comprises a first feature extraction layer and a first classifier, the first features comprising features output by a last network layer of the first feature extraction layer connected to the first classifier; the second neural network model includes a second feature extraction layer and a second classifier, and the second features include features output by an optimal one of the second feature extraction layer connected to the second classifier.

In some embodiments, the first feature further comprises a feature output by a first intermediate layer in the first feature extraction layer; the second features further include features output by a second intermediate layer in a second feature extraction layer; the difference between the first and second features described above includes: the difference between the features of the first mid-layer output and the features of the second mid-layer output in the second neural network model corresponding to the first mid-layer, and the difference between the features of the last network layer output of the first classifier connection and the features of the last network layer output of the second classifier connection.

In some embodiments, the training unit further comprises: a determining unit configured to determine that the first neural network model completes training in response to determining that the training operation reaches a preset convergence condition in each training operation; and the above apparatus further comprises: and the second pruning unit is configured to prune the trained first neural network model to obtain a pruned neural network model.

In some embodiments, the above apparatus further comprises: and the processing unit is configured to process the media data to be processed by adopting the pruned neural network model.

The units in the apparatus 400 described above correspond to the steps in the method described with reference to fig. 2 and 3. Thus, the operations, features and technical effects described above for the training method of the neural network model are also applicable to the apparatus 400 and the units included therein, and are not described herein again.

Referring now to FIG. 5, a schematic diagram of an electronic device (e.g., the server shown in FIG. 1) 500 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; a storage device 508 including, for example, a hard disk; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 5 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: iteratively executing a plurality of training operations; the training operation comprises: pruning the first neural network model to obtain a second neural network model; adopting a first neural network model to perform feature extraction on the media data to obtain first features, and adopting a second neural network model to perform feature extraction on the media data to obtain second features; acquiring a processing result of the first neural network model on the media data based on the first characteristic; determining the error of the first neural network model based on a pre-constructed supervision function and a task loss function, and iteratively adjusting the parameters of the first neural network model through back propagation errors; wherein the supervision function characterizes a difference between the first feature and the second feature, and the task loss function characterizes an error of a result of the processing of the media data by the first neural network model.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a training unit. Where the names of these units do not in some cases constitute a limitation of the unit itself, for example, the acquisition unit may also be described as a "unit that iteratively performs multiple training operations".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

17页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于ConvLSTM-SRU的扇区延误预测方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!