Method and apparatus for training a hyper-network

文档序号：1363419 发布日期：2020-08-11 浏览：12次中文

阅读说明：本技术 用于训练超网络的方法和装置 (Method and apparatus for training a hyper-network ) 是由希滕张刚温圣召于 2020-04-09 设计创作，主要内容包括：本公开涉及人工智能领域,具体公开了用于训练超网络的方法和装置。该方法包括：初始化待训练的超网络并复制初始化后的超网络得到第一超网络和第二超网络；依次执行多次迭代操作；迭代操作包括：对第一超网络进行采样得到第一子网络序列,对第一子网络序列进行乱序得到第二子网络序列；基于第一超网络训练第一子网络序列,基于第一子网络序列和第二子网络序列的训练结果分别对第一超网络和第二超网络进行拟更新；响应于确定拟更新后的第一超网络与拟更新后的第二超网络的性能之间的差异不超过预设的范围,将拟更新后的第一超网络作为新的第一超网络,将拟更新后的第二超网络作为新的第二超网络,执行下一次迭代操作。该方法提升了超网络的精度。(The disclosure relates to the field of artificial intelligence, and particularly discloses a method and a device for training a hyper-network. The method comprises the following steps: initializing a hyper-network to be trained and copying the initialized hyper-network to obtain a first hyper-network and a second hyper-network; performing a plurality of iterative operations in sequence; the iterative operation comprises: sampling a first super network to obtain a first sub network sequence, and performing disorder on the first sub network sequence to obtain a second sub network sequence; training a first sub-network sequence based on the first super-network, and respectively performing quasi-updating on the first super-network and a second super-network based on the training results of the first sub-network sequence and the second sub-network sequence; and in response to the fact that the difference between the performances of the first hyper-network to be updated and the second hyper-network to be updated does not exceed the preset range, taking the first hyper-network to be updated as a new first hyper-network, taking the second hyper-network to be updated as a new second hyper-network, and executing the next iteration operation. The method improves the precision of the super network.)

1. A method for training a super-network, comprising:

initializing a hyper-network to be trained and copying the initialized hyper-network to obtain a first hyper-network and a second hyper-network which are the same;

training the to-be-trained hyper-network by sequentially executing a plurality of iterative operations; wherein the iterative operation comprises:

sampling the first super network to obtain a first sub-network sequence, and performing disorder processing on the first sub-network sequence to obtain a second sub-network sequence;

training the first sub-network sequence based on the first super-network, and performing quasi-updating on the first super-network based on the training result of the first sub-network sequence;

training the second subnet sequence based on the second hyper-network, and performing quasi-updating on the second hyper-network based on a training result of the second subnet sequence;

and in response to the fact that the difference between the performances of the first hyper-network to be updated and the second hyper-network to be updated does not exceed the preset range, taking the first hyper-network to be updated as a new first hyper-network, taking the second hyper-network to be updated as a new second hyper-network, and executing the next iteration operation.

2. The method of claim 1, wherein the iterative operations further comprise:

determining a first distribution of performance of the first hyper-network to be updated with respect to parameters of the first hyper-network to be updated;

determining a second distribution of the performance of the second hyper-network to be updated with respect to the parameters of the second hyper-network to be updated;

in response to determining that the distance between the first distribution and the second distribution is less than a preset distance threshold, determining that a difference between the performance of the first hyper-network to be updated and the second hyper-network to be updated does not exceed a preset range.

3. The method of claim 1 or 2, wherein the iterative operations further comprise:

and in response to determining that the difference between the performances of the first hyper-network to be updated and the second hyper-network to be updated exceeds a preset range, executing next iteration operation and resampling the first hyper-network in the next iteration operation to obtain a new first sub-network sequence.

4. The method of claim 1, wherein the iterative operations further comprise:

and stopping executing the iterative operation in response to the fact that the current first hyper-network meets the preset convergence condition, and determining that the current first hyper-network is the trained hyper-network.

5. The method of claim 1, wherein the method further comprises:

and according to the type of the media data to be processed, model structure searching is carried out based on the trained super network, a sub network used for processing the data of the corresponding type is searched out, and the media data to be processed is processed by utilizing the searched sub network.

6. An apparatus for training a super-network, comprising:

the device comprises an initialization unit, a training unit and a training unit, wherein the initialization unit is configured to initialize a hyper network to be trained and copy the initialized hyper network to obtain a first hyper network and a second hyper network which are the same;

a training unit configured to train the hyper-network to be trained by sequentially performing a plurality of iterative operations;

wherein the iterative operation comprises:

sampling the first super network to obtain a first sub-network sequence, and performing disorder processing on the first sub-network sequence to obtain a second sub-network sequence;

training the first sub-network sequence based on the first super-network, and performing quasi-updating on the first super-network based on the training result of the first sub-network sequence;

training the second subnet sequence based on the second hyper-network, and performing quasi-updating on the second hyper-network based on a training result of the second subnet sequence;

7. The apparatus of claim 6, wherein the iterative operations performed by the training unit further comprise:

determining a first distribution of performance of the first hyper-network to be updated with respect to parameters of the first hyper-network to be updated;

determining a second distribution of the performance of the second hyper-network to be updated with respect to the parameters of the second hyper-network to be updated;

8. The apparatus of claim 6 or 7, wherein the iterative operations performed by the training unit further comprise:

9. The apparatus of claim 6, wherein the iterative operations performed by the training unit further comprise:

10. The apparatus of claim 6, wherein the apparatus further comprises:

and the searching unit is configured to perform model structure searching based on the trained super network according to the type of the media data to be processed, search out a sub network for processing the data of the corresponding type, and process the media data to be processed by using the searched sub network.

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-5.

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to the technical field of artificial intelligence, and particularly relates to a method and a device for training a hyper-network.

Background

Deep neural networks have achieved significant success in many areas. The structure of the deep neural network model has a direct impact on its performance. The structure of the traditional neural network model is designed by experts according to experience, needs rich expert knowledge, and has higher design cost of the network structure.

NAS (Neural Architecture Search, network Architecture auto Search) is a method for automatically searching out the optimal Neural network Architecture by using an algorithm instead of a tedious manual operation. In one current approach, the super-network is trained by pre-constructing the super-network containing all possible model structures. Then, in the actual deep learning task, a proper sub-network is searched out from the super network through NAS to be used as a neural network model for executing the deep learning task.

However, since all network structures in the super network coexist, there is a mutual exclusion problem in the training of the super network. In order to make all network structures have better performance, the training process of the hyper-network leads to the fact that the performance of the network structures has a larger difference from the performance of the independently trained network.

Disclosure of Invention

Embodiments of the present disclosure present methods and apparatus, electronic devices, and computer-readable media for training a hyper-network.

In a first aspect, an embodiment of the present disclosure provides a method for training a super-network, including: initializing a hyper-network to be trained and copying the initialized hyper-network to obtain a first hyper-network and a second hyper-network which are the same; training a hyper-network to be trained by sequentially executing a plurality of iterative operations; wherein the iterative operation comprises: sampling the first super network to obtain a first sub-network sequence, and performing disorder processing on the first sub-network sequence to obtain a second sub-network sequence; training a first sub-network sequence based on the first super-network, and performing quasi-updating on the first super-network based on the training result of the first sub-network sequence; training a second sub-network sequence based on the second super-network, and performing quasi-updating on the second super-network based on the training result of the second sub-network sequence; and in response to the fact that the difference between the performances of the first hyper-network to be updated and the second hyper-network to be updated does not exceed the preset range, taking the first hyper-network to be updated as a new first hyper-network, taking the second hyper-network to be updated as a new second hyper-network, and executing the next iteration operation.

In some embodiments, the above iterative operations further comprise: determining a first distribution of performance of the first hyper-network to be updated with respect to parameters of the first hyper-network to be updated; determining a second distribution of the performance of the second hyper-network to be updated with respect to the parameters of the second hyper-network to be updated; in response to determining that the distance between the first distribution and the second distribution is less than a preset distance threshold, determining that a difference between the performance of the first hyper-network to be updated and the second hyper-network to be updated does not exceed a preset range.

In some embodiments, the above iterative operations further comprise: and in response to determining that the difference between the performances of the first hyper-network to be updated and the second hyper-network to be updated exceeds a preset range, executing next iteration operation and resampling the first hyper-network in the next iteration operation to obtain a new first sub-network sequence.

In some embodiments, the above iterative operations further comprise: and stopping executing the iterative operation in response to the fact that the current first hyper-network meets the preset convergence condition, and determining that the current first hyper-network is the trained hyper-network.

In some embodiments, the above method further comprises: and according to the type of the media data to be processed, model structure searching is carried out based on the trained super network, a sub network used for processing the data of the corresponding type is searched out, and the media data to be processed is processed by utilizing the searched sub network.

In a second aspect, an embodiment of the present disclosure provides an apparatus for training a super-network, including: the device comprises an initialization unit, a training unit and a training unit, wherein the initialization unit is configured to initialize a hyper network to be trained and copy the initialized hyper network to obtain a first hyper network and a second hyper network which are the same; a training unit configured to train a hyper-network to be trained by sequentially performing a plurality of iterative operations; wherein the iterative operation comprises: sampling the first super network to obtain a first sub-network sequence, and performing disorder processing on the first sub-network sequence to obtain a second sub-network sequence; training a first sub-network sequence based on the first super-network, and performing quasi-updating on the first super-network based on the training result of the first sub-network sequence; training a second sub-network sequence based on the second super-network, and performing quasi-updating on the second super-network based on the training result of the second sub-network sequence; and in response to the fact that the difference between the performances of the first hyper-network to be updated and the second hyper-network to be updated does not exceed the preset range, taking the first hyper-network to be updated as a new first hyper-network, taking the second hyper-network to be updated as a new second hyper-network, and executing the next iteration operation.

In some embodiments, the iterative operations performed by the training unit further include: determining a first distribution of performance of the first hyper-network to be updated with respect to parameters of the first hyper-network to be updated; determining a second distribution of the performance of the second hyper-network to be updated with respect to the parameters of the second hyper-network to be updated; in response to determining that the distance between the first distribution and the second distribution is less than a preset distance threshold, determining that a difference between the performance of the first hyper-network to be updated and the second hyper-network to be updated does not exceed a preset range.

In some embodiments, the iterative operations performed by the training unit further include: and in response to determining that the difference between the performances of the first hyper-network to be updated and the second hyper-network to be updated exceeds a preset range, executing next iteration operation and resampling the first hyper-network in the next iteration operation to obtain a new first sub-network sequence.

In some embodiments, the iterative operations performed by the training unit further include: and stopping executing the iterative operation in response to the fact that the current first hyper-network meets the preset convergence condition, and determining that the current first hyper-network is the trained hyper-network.

In some embodiments, the above apparatus further comprises: and the searching unit is configured to perform model structure searching based on the trained super network according to the type of the media data to be processed, search out a sub network for processing the data of the corresponding type, and process the media data to be processed by using the searched sub network.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement a method for training a hyper-network as provided in the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, where the program, when executed by a processor, implements the method for training a hyper-network provided by the first aspect.

The method and the device for training the hyper-network of the embodiment of the disclosure firstly initialize the hyper-network to be trained and copy the initialized hyper-network to obtain a first hyper-network and a second hyper-network which are the same; then training the hyper-network to be trained by sequentially executing a plurality of iterative operations, wherein the iterative operations comprise: sampling the first super network to obtain a first sub-network sequence, and performing disorder processing on the first sub-network sequence to obtain a second sub-network sequence; training a first sub-network sequence based on the first super-network, and performing quasi-updating on the first super-network based on the training result of the first sub-network sequence; training a second sub-network sequence based on the second super-network, and performing quasi-updating on the second super-network based on the training result of the second sub-network sequence; and in response to the fact that the difference between the performances of the first hyper-network to be updated and the second hyper-network to be updated does not exceed the preset range, taking the first hyper-network to be updated as a new first hyper-network, taking the second hyper-network to be updated as a new second hyper-network, and executing the next iteration operation. The accuracy of the super network obtained by training of the method and the device is higher, and the performance of the sub network sampled from the super network obtained by training is consistent with that of the network with the same structure obtained by independent training, so that the sub network with good performance can be quickly searched out when the model structure is automatically searched based on the super network.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which embodiments of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for training a hyper-network according to the present disclosure;

FIG. 3 is a flow diagram of another embodiment of a method for training a hyper-network according to the present disclosure;

FIG. 4 is a schematic block diagram illustrating one embodiment of an apparatus for training a hyper-network according to the present disclosure;

FIG. 5 is a schematic block diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an example system architecture 100 to which the disclosed method for training a hyper-network or apparatus for training a hyper-network may be applied.

As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The terminal devices 101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. The end devices 101, 102, 103 may be customer premises devices on which various client applications may be installed. Such as image processing-type applications, information analysis-type applications, voice assistant-type applications, shopping-type applications, financial-type applications, and the like.

The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server running various services, such as a server running an image or voice data based target tracking, voice processing service. The server 105 may obtain deep learning task data from the terminal devices 101, 102, 103 or obtain deep learning task data from a database to construct training samples, and automatically search and optimize a model structure of a neural network for performing a deep learning task.

In an application scenario of an embodiment of the present disclosure, the server 105 may implement automatic search of a model structure of a neural network through a super network. The server 105 may train the super network based on the acquired deep learning task data, such as media data of images, texts, voices, and the like, and after the super network training is completed, the server 105 may automatically search through the model structure to sample a sub-network structure from the super network to execute a corresponding task.

The server 105 may also be a backend server providing backend support for applications installed on the terminal devices 101, 102, 103. For example, the server 105 may receive data to be processed sent by the terminal devices 101, 102, 103, process the data using the neural network model, and return the processing result to the terminal devices 101, 102, 103.

In a real scenario, the terminal devices 101, 102, 103 may send deep learning task requests related to tasks such as voice interaction, text classification, image recognition, key point detection, etc. to the server 105. A neural network model, which has been trained for a corresponding deep learning task, may be run on the server 105, with which information is processed.

It should be noted that the method for training the super network provided by the embodiment of the present disclosure is generally performed by the server 105, and accordingly, the apparatus for training the super network is generally disposed in the server 105.

In some scenarios, the server 105 may obtain source data (e.g., training samples, a hyper-network to be trained) required for hyper-network training from a database, memory, or other device, in which case the exemplary system architecture 100 may be absent of the end devices 101, 102, 103 and the network 104.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for training a hyper-network in accordance with the present disclosure is shown. The method for training the hyper-network comprises the following steps:

step 201, initializing a to-be-trained super network and copying the initialized super network to obtain a first super network and a second super network which are the same.

In this embodiment, the executive of the method for training the super network may first obtain the super network to be trained. The hyper-network to be trained may be pre-constructed. Each layer of the super network may contain a plurality of network fabric elements in a network fabric search space. Here, the network structure unit may be formed by a single network layer, such as a single convolutional layer, a single cyclic unit in a cyclic neural network, or may be formed by combining a plurality of network layers, such as a convolutional block (block) formed by convolutional layers, batch normalization layers, and nonlinear layer connections. In a super network, each network fabric element may be connected to all network fabric elements of its upper and lower layers. Parameters in the super-network are optimized through multiple rounds of iteration operation in the training process, wherein the parameters comprise weight parameters, bias parameters, convolution kernels and the like.

The parameters of the hyper-network to be trained may be initialized randomly or to set values. Optionally, the to-be-trained super network may be a super network pre-trained based on sample data, and then parameters of the pre-trained super network may be obtained as a parameter initialization result of the to-be-trained super network.

The parameter initialized hyper-network can be copied to obtain a first hyper-network and a second hyper-network with the same structure and parameters. Thereafter, the first and second hypernetworks can be iteratively trained via step 202.

Step 202, training the to-be-trained super-network by sequentially executing a plurality of iterative operations.

The training operation of the hyper-network to be trained may be performed by training the first hyper-network and the second hyper-network.

Specifically, the iterative operation includes the following step 2021, step 2022, step 2023, and step 2024.

First, in step 2021, a first super network is sampled to obtain a first sub network sequence, and the first sub network sequence is subjected to a disorder process to obtain a second sub network sequence.

A plurality of subnetworks may be sampled from the first super network to form a first sequence of subnetworks. Specifically, a random sampling method may be adopted to extract a plurality of network layers from the first super network, extract a connection relationship between the network layers, and obtain parameters of the sampled sub-network from the first super network.

Alternatively, a trained recurrent neural network can be used to sample the first subnetwork sequence from the first super-network. The recurrent neural network can be trained in advance based on the hyper-network and deep learning training data. The recurrent neural network takes the code of the first super network as input and outputs the code of the first sub-network or the code of the first sub-network sequence obtained by sampling. In each iteration, the recurrent neural network may sample a plurality of first sub-networks, which are arranged in sequence to form a first sequence of sub-networks.

A first subnetwork in the first subnetwork sequence can be shuffled resulting in a second subnetwork sequence. That is, the first sub-network sequence and the second sub-network sequence contain the same sub-networks, but the sub-networks are arranged in different orders.

It should be noted that the first sub-network sequence sampled in different iteration operations may be different, and each time the iteration operation starts, the first super-network is re-sampled to obtain a new first sub-network sequence.

Then, in step 2022, the first subnetwork sequence is trained based on the first super-network, and the first super-network is quasi-updated based on the training result of the first subnetwork sequence.

In this embodiment, the first subnetwork sequence can be trained based on the first super-network. Specifically, the first sub-network sequence may be initialized according to current parameters of the first super-network, and the sub-networks may be trained in a supervised or unsupervised manner based on training data of deep learning tasks corresponding to the sub-networks in the first sub-network sequence. Each sub-network can be continuously optimized by iteratively adjusting parameters of the sub-network in the sub-network training process. After the training of the first sub-network sequence is completed, the performance information of the trained first sub-network sequence is obtained by using the test data. Back-propagation is performed based on the performance information of the first sub-network sequence to determine parameters for the first super-network to be updated. Wherein the performance information of the first sequence of subnetworks may be obtained based on a composite statistical result of the performance of the respective subnetworks therein.

The parameters for updating the first super network are obtained based on the performance information back propagation of the first sub-network sequence, and specifically, a loss function may be constructed based on performance indexes such as an error, a hardware delay, and a memory occupancy rate of the first sub-network sequence, a gradient of the loss function with respect to each parameter of the first super network is calculated, and the parameters of the first super network after updating are calculated according to a preset gradient descent rate (learning rate).

Next, in step 2023, the second subnetwork sequence is trained based on the second super network, and the second super network is quasi-updated based on the training result of the second subnetwork sequence.

Similar to the first sub-network sequence, the second sub-network sequence may be trained based on the second super-network. Specifically, the parameters of the second sub-network sequence may be initialized according to the current parameters of the second super-network, and each initialized sub-network may be trained in a supervised or unsupervised manner by using the training data of the deep learning task corresponding to each sub-network in the second sub-network sequence. Each sub-network can be continuously optimized by iteratively adjusting parameters of the sub-network in the sub-network training process. And after the training of the second sub-network sequence is completed, the performance information of the trained second sub-network sequence is obtained by using the test data. Back-propagation is performed based on the performance information of the second sub-network sequence to determine parameters for the second super-network to be updated. Wherein the performance information of the second sequence of subnetworks may be obtained based on a composite statistical result of the performance of the respective subnetworks therein.

Specifically, the loss function may be constructed based on performance indicators, such as an error, a hardware delay, and a memory occupancy rate of the second sub-network sequence. Then, the gradient of the loss function with respect to each parameter of the second super network can be calculated, and the parameter of the second super network to be updated is calculated according to a preset gradient descent rate (learning rate).

It should be noted that, in step 2023, the quasi-updating operation is performed on the first super-network and the second super-network according to the training results of the first sub-network sequence and the second sub-network sequence, and the parameters of the first super-network and the second super-network are not updated yet. Whether the first and second super networks to be updated are to be used as the updated first and second super networks in the current iteration operation can be further judged based on the subsequent steps.

In step 2024, in response to determining that the difference between the performance of the first super-network to be updated and the performance of the second super-network to be updated does not exceed the preset range, the first super-network to be updated is used as a new first super-network, the second super-network to be updated is used as a new second super-network, and the next iteration operation is performed.

In this embodiment, the performance of the first hyper-network to be updated and the second hyper-network to be updated may be tested. Specifically, the sub-network sequences for testing may be sampled from the first super-network and the second super-network, respectively, and the performance of the sampled sub-network sequences may be determined as the performance of the corresponding first super-network and the performance of the corresponding second super-network.

Then, the performance of the first super-network and the performance of the second super-network may be compared, and if a difference between the two does not exceed a preset range, for example, a difference between a ratio of the number of sub-networks having the accuracy rate exceeding a threshold value in the first super-network to a ratio of the number of sub-networks having the accuracy rate exceeding the threshold value in the second super-network is smaller than a preset percentage, or a difference between an accuracy rate distribution of sub-networks having the accuracy rate exceeding a threshold value in the first super-network and an accuracy rate distribution of sub-networks having the accuracy rate exceeding a threshold value in the second super-network does not exceed a preset distribution distance, it may be determined that the difference between the performance of the first super-network and the performance of the second super-network does not exceed the preset. At this time, the first super network and the second super network may be updated, the first super network to be updated is used as the updated first super network in the current iteration operation, the second super network to be updated is used as the updated second super network in the current iteration operation, and then the step 2021 is returned based on the updated first super network and the updated second super network, and the next iteration operation is continuously performed.

In the training method for the super-network according to the above embodiment, the parameters of the first super-network and the second super-network can be gradually optimized by determining the update-planned parameters of the first super-network and the second super-network based on the training results of the sub-networks in each iteration operation, and determining whether to update the first super-network and the second super-network by using the update-planned parameters based on the difference between the performances of the first super-network and the second super-network, and the first super-network and the second super-network are updated when the difference between the performances of the first super-network and the second super-network does not exceed the preset range, so that the first super-network obtained by training is not sensitive to the sequence of the sub-networks in the sub-network sequence, and thus the dependency between the parameters of different sub-networks included in the first super-network in the training process is reduced, and the consistency between the performances of the sub-networks extracted from the first super-network and the performances of the independently trained networks can be improved, thereby improving the accuracy of the super network.

In some embodiments, the above iterative operation may further include: determining a first distribution of performance of the first hyper-network to be updated with respect to parameters of the first hyper-network to be updated, determining a second distribution of performance of the second hyper-network to be updated with respect to parameters of the second hyper-network to be updated, and in response to determining that a distance between the first distribution and the second distribution is less than a preset distance threshold, determining that a difference between the performance of the first hyper-network to be updated and the performance of the second hyper-network to be updated does not exceed a preset range.

That is, before step 2024, the distributions of the parameters of the first hyper-network to be updated and the second hyper-network to be updated may be compared, and the difference between the performances of the two may be determined based on the similarity between the parameter distributions of the two.

Specifically, the first distribution of the performance of the first super network to be updated with respect to the parameters of the first super network to be updated may be a probability distribution of performance indexes such as accuracy, delay under a specific operating environment, memory occupancy rate, and the like of the first super network to be updated with respect to the parameters of the first super network to be updated; accordingly, the second distribution of the performance of the second super network to be updated with respect to the parameters of the second super network to be updated may be a probability distribution of performance indexes such as accuracy, delay under a specific operating environment, memory occupancy rate, and the like of the second super network to be updated with respect to the parameters of the second super network to be updated.

The distribution distance between the first distribution and the second distribution described above may be calculated using a KL distance (Kullback-Leibler Divergence). If the distribution distance between the first distribution and the second distribution is less than a preset distance threshold, it may be determined that a difference between the performance of the first hyper-network to be updated and the performance of the second hyper-network to be updated does not exceed a preset range. Conversely, if the distribution distance between the first distribution and the second distribution is not less than the preset distance threshold, it may be determined that the difference between the performance of the first hyper-network to be updated and the performance of the second hyper-network to be updated exceeds the preset range.

Because the performance of the neural network and the distribution of the parameters have close dependence relationship, the change of the distribution of the parameters may bring about the mutation of the performance of the neural network, and the performances of different neural networks with the same structure and the similar distribution of the parameters are similar. By comparing the parameter distribution of the first hyper-network to be updated and the parameter distribution of the second hyper-network to be updated, the performance difference between the first hyper-network and the second hyper-network can be more accurately determined, so that whether the parameters of the first hyper-network and the second hyper-network are updated in the iterative operation is determined based on the difference of the parameter distributions, and a more accurate hyper-network training result can be obtained.

Further, the iteration operation in step 202 may further include: and in response to determining that the difference between the performances of the first hyper-network to be updated and the second hyper-network to be updated exceeds a preset range, executing next iteration operation and resampling the first hyper-network in the next iteration operation to obtain a new first sub-network sequence.

If the difference between the performances of the first hyper-network to be updated and the second hyper-network to be updated exceeds a preset range, determining that the parameter to be updated of the first hyper-network is sensitive to the sequence of the sub-networks in the sub-network sequence, so that the performance difference between the sub-network sampled in the first hyper-network after updating the parameter and the independently trained neural network with the same structure is larger, therefore, the parameter of the first hyper-network in the current iteration operation can not be updated under the condition, the parameters of the first hyper-network and the second hyper-network after the last iteration operation are kept unchanged, and the next iteration operation is executed. And a new first sub-network sequence is re-sampled in the next iteration. In this way, it can be ensured that the performance of the trained first super-network does not change greatly due to the change of the sequence of the sub-networks in the network sequence, and the parameters of the trained first super-network can achieve better performance for different sub-networks, so that the performance of the sub-network sampled from the trained first super-network is consistent with the performance achieved after independently training the neural network with the same structure as the sub-network.

In some embodiments, the iteration in step 202 further includes: and stopping executing the iterative operation in response to the fact that the current first hyper-network meets the preset convergence condition, and determining that the current first hyper-network is the trained hyper-network.

The preset convergence condition may include, but is not limited to, at least one of the following: the parameters of the first hyper-network are updated in the last iterations, the update rate is lower than a preset update rate threshold, the iteration number of the first hyper-network reaches a preset number threshold, and the precision of the first hyper-network reaches a preset precision threshold. When the first hyper-network converges after a plurality of iteration operations, the iteration may be stopped, and the first hyper-network obtained in the last iteration operation is used as a training completion hyper-network. Otherwise, if the first hyper-network does not meet the preset condition after the current iteration operation, continuing to execute the next iteration operation.

With continued reference to fig. 3, a flow diagram of another embodiment of a training method for a hypernetwork in accordance with the present disclosure is shown. As shown in fig. 3, the process 300 of the training method of the super network of the present embodiment includes the following steps:

step 301, initializing a to-be-trained super network and copying the initialized super network to obtain a first super network and a second super network which are the same;

step 302, training the hyper-network to be trained by sequentially performing a plurality of iterative operations.

The iterative operation includes steps 3021, 3022, 3023, and 3024 as follows:

step 3021, sampling the first super network to obtain a first sub-network sequence, and performing out-of-order processing on the first sub-network sequence to obtain a second sub-network sequence.

Step 3022, training the first subnet sequence based on the first hyper-network, and performing a pseudo-update on the first hyper-network based on a training result of the first subnet sequence.

And step 3023, training the second subnet sequence based on the second hyper-network, and performing pseudo-update on the second hyper-network based on the training result of the second subnet sequence.

Step 3024, in response to determining that the difference between the performance of the first hyper-network to be updated and the performance of the second hyper-network to be updated does not exceed the preset range, taking the first hyper-network to be updated as a new first hyper-network, and taking the second hyper-network to be updated as a new second hyper-network, and executing the next iteration operation.

Step 301, step 302, step 3021, step 3022, step 3023, and step 3024 in this embodiment are respectively the same as step 201, step 202, step 2021, step 2022, step 2023, and step 2024 in the foregoing embodiment, and specific implementations of step 301, step 302, step 3021, step 3022, step 3023, and step 3024 may refer to descriptions of corresponding steps in the foregoing embodiment and are not repeated herein.

In this embodiment, the process 300 of the training method for the super network further includes:

and 303, according to the type of the media data to be processed, performing model structure search based on the trained super network, searching out a sub network for processing the data of the corresponding type, and processing the media data to be processed by using the searched sub network.

Media data to be processed can be obtained, and the media data can be data in image, text, audio, video and other formats. The type of media data may be determined according to its data format. Optionally, the type of the media data may be configured in advance based on task parameters of the corresponding deep learning task, where the task parameters may include one or more of a task type, a goal of the task, an amount of data required to be processed by the task, and the like. For example, the media data type corresponding to the image recognition task is an image recognition class, and the media data type corresponding to the text translation task is a text translation class.

In this embodiment, the trained super network can be used to search out the structure of the sub-network for processing the type of media data according to the type of the media data to be processed. Specifically, the type of the media data may be input into a pre-trained network structure sampler, and the trained super network may be subjected to structure sampling by using the network structure sampler. The sampler may be implemented as a recurrent neural network, a convolutional neural network, or the like.

After the subnetwork structure is searched out from the trained super network, the media data to be processed can be processed by utilizing the searched subnetwork, and a processing result is obtained. The searched sub-networks inherit the corresponding parameters in the super-network without training. And the sub-network is able to achieve performance consistent with the same network architecture as independently trained. Therefore, the trained hyper-network can be suitable for rapidly searching out network structures for processing various types of media data, and can achieve good performance aiming at different types of deep learning tasks.

In the super-network training method based on the prior art, a sub-network needs to be sampled after super-network training is completed, and consistency between the performance of the sub-network and the performance of a network with the same structure which is independently trained is evaluated, so that time cost for evaluating the consistency between the performance of the super-network and the performance of the network structure which is independently trained is high, the sub-network with better performance cannot be quickly searched out when automatic network structure searching is carried out based on the super-network, and the network structure is not favorably and efficiently searched. In the embodiment, because the influence of the sequence of different subnetworks on the performance of the hyper-network is considered in the hyper-network training process, the sampled subnetwork can reach the hyper-network with the performance consistent with the performance of the independently trained network structure, so that the subnetwork with good performance can be efficiently searched out when the network structure is automatically searched based on the hyper-network, the automatic searching efficiency of the network structure is improved, a suitable network structure can be flexibly and quickly searched out when a deep learning task is executed, and the flexibility and the real-time performance of executing the deep learning task based on the hyper-network are improved.

Referring to fig. 4, as an implementation of the method for training a super network, the present disclosure provides an embodiment of an apparatus for training a super network, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2 and 3, and the apparatus may be applied to various electronic devices.

As shown in fig. 4, the apparatus 400 for training a super network of the present embodiment includes an initialization unit 401 and a training unit 402. The initialization unit 401 is configured to initialize a to-be-trained super network and copy the initialized super network to obtain a first super network and a second super network which are the same; the training unit 402 is configured to train the hyper-network to be trained by sequentially performing a plurality of iterative operations; wherein the iterative operation comprises: sampling the first super network to obtain a first sub-network sequence, and performing disorder processing on the first sub-network sequence to obtain a second sub-network sequence; training a first sub-network sequence based on the first super-network, and performing quasi-updating on the first super-network based on the training result of the first sub-network sequence; training a second sub-network sequence based on the second super-network, and performing quasi-updating on the second super-network based on the training result of the second sub-network sequence; and in response to the fact that the difference between the performances of the first hyper-network to be updated and the second hyper-network to be updated does not exceed the preset range, taking the first hyper-network to be updated as a new first hyper-network, taking the second hyper-network to be updated as a new second hyper-network, and executing the next iteration operation.

In some embodiments, the iterative operations performed by the training unit 402 further include: determining a first distribution of performance of the first hyper-network to be updated with respect to parameters of the first hyper-network to be updated; determining a second distribution of the performance of the second hyper-network to be updated with respect to the parameters of the second hyper-network to be updated; in response to determining that the distance between the first distribution and the second distribution is less than a preset distance threshold, determining that a difference between the performance of the first hyper-network to be updated and the second hyper-network to be updated does not exceed a preset range.

In some embodiments, the iterative operations performed by the training unit 402 further include: and in response to determining that the difference between the performances of the first hyper-network to be updated and the second hyper-network to be updated exceeds a preset range, executing next iteration operation and resampling the first hyper-network in the next iteration operation to obtain a new first sub-network sequence.

In some embodiments, the iterative operations performed by the training unit 402 further include: and stopping executing the iterative operation in response to the fact that the current first hyper-network meets the preset convergence condition, and determining that the current first hyper-network is the trained hyper-network.

The units in the apparatus 400 described above correspond to the steps in the method described with reference to fig. 2 and 3. Thus, the operations, features and technical effects described above for the method for training a super network are also applicable to the apparatus 400 and the units included therein, and are not described herein again.

Referring now to FIG. 5, a schematic diagram of an electronic device (e.g., the server shown in FIG. 1) 500 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; a storage device 508 including, for example, a hard disk; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 5 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: initializing a hyper-network to be trained and copying the initialized hyper-network to obtain a first hyper-network and a second hyper-network which are the same; training a hyper-network to be trained by sequentially executing a plurality of iterative operations; wherein the iterative operation comprises: sampling the first super network to obtain a first sub-network sequence, and performing disorder processing on the first sub-network sequence to obtain a second sub-network sequence; training a first sub-network sequence based on the first super-network, and performing quasi-updating on the first super-network based on the training result of the first sub-network sequence; training a second sub-network sequence based on the second super-network, and performing quasi-updating on the second super-network based on the training result of the second sub-network sequence; and in response to the fact that the difference between the performances of the first hyper-network to be updated and the second hyper-network to be updated does not exceed the preset range, taking the first hyper-network to be updated as a new first hyper-network, taking the second hyper-network to be updated as a new second hyper-network, and executing the next iteration operation.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an initialization unit and a training unit. The names of these units do not form a limitation on the unit itself under certain circumstances, for example, the initialization unit may also be described as "a unit that initializes the hyper network to be trained and copies the initialized hyper network to obtain the same first hyper network and second hyper network".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

18页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：神经网络模型的训练方法和装置

Method and apparatus for training a hyper-network

相关技术

网友询问留言