Impulse neural network training method based on membrane potential self-increment mechanism

文档序号：1964406 发布日期：2021-12-14 浏览：18次中文

阅读说明：本技术 一种基于膜电位自增机制的脉冲神经网络训练方法 (Impulse neural network training method based on membrane potential self-increment mechanism ) 是由梁东晨曹江平洋吴冠霖栾绍童闫妍马宁于 2021-09-10 设计创作，主要内容包括：本发明公开的一种基于膜电位自增机制的脉冲神经网络训练方法,涉及一种脉冲神经网络训练方法,特别涉及一种基于膜电位自增机制的时间编码训练方法,属于人工智能领域。本发明针对脉冲神经网络训练时受到信号稀疏性的影响,只有少数神经元被激活,网络输出层的误差无法有效传播到网络的各个隐层,不能参与参数更新的问题,在采用时间编码方法对脉冲神经网络进行训练时,为脉冲神经元细胞膜电位动力学模型加入一个随时间变化的自增项。这个自增项使得全部脉冲神经元都可以在有限的时间内被激活,从而使反向传播过程能够为全部神经元更新参数,提升脉冲神经网络的训练效果。本方法可用于人工智能、神经形态工程、机器人等领域,实现高精度控制。(The invention discloses a pulse neural network training method based on a membrane potential self-increasing mechanism, relates to a pulse neural network training method, in particular to a time coding training method based on a membrane potential self-increasing mechanism, and belongs to the field of artificial intelligence. Aiming at the problems that only a few neurons are activated due to the influence of signal sparsity during the training of the impulse neural network, the error of the network output layer cannot be effectively transmitted to each hidden layer of the network, and the network output layer cannot participate in parameter updating, the invention adds a time-varying self-increment term to the impulse neural cell membrane potential kinetic model when the impulse neural network is trained by adopting a time coding method. The self-increment item enables all the impulse neurons to be activated within a limited time, so that parameters can be updated for all the neurons in a back propagation process, and the training effect of the impulse neural network is improved. The method can be used in the fields of artificial intelligence, neuromorphic engineering, robots and the like, and high-precision control is realized.)

1. A pulse neural network training method based on a membrane potential self-increment mechanism is characterized by comprising the following steps: comprises the following steps:

step one, based on a robot virtual simulation environment, a deep reinforcement learning method is adopted to build a reinforcement learning environment, and an artificial neural network part in the reinforcement learning environment is replaced by a pulse neural network;

step two, taking the environmental state information provided by the virtual simulation environment of the robot as an input signal of the impulse neural network;

constructing a pulse neural network by adopting a neuron model introducing a membrane potential self-increment mechanism;

step four, under a reinforcement learning framework, training the impulse neural network by adopting a back propagation method, so that the impulse neural network can accurately predict future rewards corresponding to the current environment state, and the output result of the impulse neural network is used for controlling the robot;

and step five, taking the output of the pulse neural network as the expected future reward obtained after the robot takes each action under the current environment state, and selecting the action with the highest future reward to control the robot.

2. The method of claim 1, wherein the method comprises: the third step is realized by the following steps:

in the training process, integration and excitation neurons without leakage current are adopted, a self-adding term beta exp (t) is added at the right end of a membrane potential kinetic formula, beta is a parameter capable of adjusting the size of the self-adding term, and the new membrane potential kinetic formula can be expressed as follows:

wherein, V_mem(t) is the cell membrane potential, which is a function of time t; synaptic Current input to the right of the formula, w_iIs the weight of the synaptic connections and,for the time that the ith neuron sends the r-th pulse, κ is the calculation formula of synaptic current:

wherein, T_synFor time constant, it is set to 1 for simplified formulation;

integrating equation (1) yields:

V_ment(t_out)＝∑_i∈Cw_i(1-exp(-t_out+t_i))+βexp(t_out)-β (3)

wherein, t_outThe time for which a pulse is generated after a neuron is activated;C＝{i：t_i＜t_outall at t }_outPreviously occurring input pulses, only these pulses affecting t_out；V_ment(t_out) Is the threshold value that the cell membrane potential needs to reach when the neuron is activated, and is set to 1 in the following formula for simplifying formula expression;

for exp (t) in equation (3)_out) Solving, an expression can be obtained:

the condition that the formula (4) is satisfied is that the formula (5) is satisfied, and t is_outThe time is more than 0, so the right end of the formula (4) is more than 1, and the formula (6) is also satisfied;

(∑_i∈Cw_i-1-β)²＞-4β∑_i∈Cw_iexp(t_i) (5)

due to beta sigma_i∈Cw_iexp(t_i) If greater than 0, the formula (5) is always true;

∑_i∈Cw_i(exp(t_i)-1)＞-1 (7)

equation (6) is equivalent to equation (7) due to exp (t)_i) More than 1, (7) always stand; therefore, the formula (4) is always satisfied, so t_outIt is always present that every neuron in a spiking neural network will be excited for a limited time frame.

3. The method of claim 1, wherein the method comprises: the implementation method of the fourth step is as follows:

when a back propagation algorithm is adopted for training, the pulse neural network needs to be converted firstly:

if z is_out＝exp(t_out)，z_i＝exp(t_i) Substituting (4) can result in:

equation (8) can be written as:

z_out＝f(∑_i∈Cw_iz_i) (9)

if z is to be_iThe activation value of the neuron in the previous layer is regarded as f is an activation function, and z is_outEquation (9) has a format consistent with the activation function of the artificial neural network for the output of the current neuron; therefore, the back propagation algorithm can be applied to the impulse neural network training introducing the self-augmentation term, namely, an equivalent artificial neural network is constructed, the back propagation algorithm is adopted for training, and the training result is used for updating the parameters of the impulse neural network and is consistent with the principle of the time coding method.

Technical Field

The invention relates to a method for training a pulse neural network, in particular to a method for training a pulse neural network based on a membrane potential self-increment mechanism, and belongs to the field of artificial intelligence.

Background

Compared with the traditional calculation method, the pulse neural network has the advantages of low power consumption, low delay processing and the like, is combined with a brain-like sensor, and can be used for low-power consumption and low-delay control of the robot.

There are many methods for training the impulse neural network, mainly including: firstly, training by using synaptic plasticity according to a biological principle; firstly, carrying out artificial neural network training, and converting the trained network into a pulse neural network; thirdly, a back propagation technology in the artificial neural network is used for training the impulse neural network, the challenge is the problem of inconductibility of an impulse neuron model, and currently available solutions mainly comprise a frequency coding method, a time coding method, a derivative substitution method and the like.

The time coding method is to code information into the sending time of the pulse, and obtain a calculation model consistent with the artificial neural network through formula conversion, so that the back propagation technology can be applied to training of the pulse neural network, and is more suitable for being developed on a mainstream deep learning platform compared with other methods. However, when this method is used, most neurons cannot be activated during training due to sparsity of input signals of the spiking neural network. The derivatives of the neurons are set to be 0 during back propagation and cannot participate in training, so that the performance of the impulse neural network during reinforcement learning is damaged, and the performance of the impulse neural network cannot be used for accurate control of the robot.

Disclosure of Invention

The invention aims to solve the problems that when a time coding method is adopted to train a pulse neural network, only a few neurons are activated due to the influence of signal sparsity, and errors of a network output layer cannot be effectively transmitted to various hidden layers of the network and cannot participate in parameter updating. The pulse neural network can be used in the fields of artificial intelligence, neuromorphic engineering, robots and the like through reinforcement learning, and high-precision control is realized.

The purpose of the invention is realized by the following technical scheme:

when the invention trains the impulse neural network by adopting a time coding method, a time-varying self-increment term is added to the membrane potential kinetic model of the impulse neuron cell. The self-increment item enables all the impulse neurons to be activated within a limited time, so that parameters can be updated for all the neurons in a back propagation process, and the training effect of the impulse neural network is improved.

The method comprises the following specific implementation steps:

and step two, taking the environmental state information provided by the virtual simulation environment of the robot as an input signal of the impulse neural network.

And thirdly, constructing a pulse neural network by adopting a neuron model introducing a membrane potential self-increment mechanism.

In the training process, integration and excitation neurons without leakage current are adopted, a self-adding term beta exp (t) is added at the right end of a membrane potential kinetic formula, beta is a parameter capable of adjusting the size of the self-adding term, and the new membrane potential kinetic formula can be expressed as follows:

wherein, V_mem(t) is the cell membrane potential, which is a function of time t. Formula right sideIs the input synaptic current, w_iIs the weight of the synaptic connections and,for the time that the ith neuron sends the r-th pulse, κ is the calculation formula of synaptic current:

wherein, tau_synFor time constant, it is set to 1 to simplify the formulation.

Integrating equation (1) yields:

V_ment(t_out)＝∑_i∈Cw_i(1-exp(-t_out+t_i))+βexp(t_out)-β (3)

wherein, t_outIs the time at which the pulse is generated after the neuron is activated.C＝{i:t_i<t_outAll at t }_outInput pulses occurring before, only these pulses influencing t_out。V_ment(t_out) Is the threshold that the cell membrane potential needs to reach when the neuron is activated, and is set to 1 in the following formula for simplicity.

For exp (t) in equation (3)_out) Solving, an expression can be obtained:

the condition that the formula (4) is satisfied is that the formula (5) is satisfied, and t is_outFor time, it needs to be greater than 0, so the right end of equation (4) should be greater than 1, and equation (6) also needs to be satisfied.

(∑_i∈Cw_i-1-β)²>-4β∑_i∈Cw_iexp(t_i) (5)

Due to beta sigma_i∈Cw_iexp(t_i) Above 0, equation (5) holds.

∑_i∈Cw_i(exp(t_i)-1)>-1 (7)

Equation (6) is equivalent to equation (7) due to exp (t)_i)>1, equation (7) is always true. Therefore, the formula (4) is always satisfied, so t_outIt is always present that every neuron in a spiking neural network will be excited for a limited time frame.

And step four, under a reinforcement learning framework, training the impulse neural network by adopting a back propagation method, so that the impulse neural network can accurately predict future rewards corresponding to the current environment state, and the output result of the impulse neural network is used for controlling the robot.

When a back propagation algorithm is adopted for training, the pulse neural network needs to be converted firstly:

if z is_out＝exp(t_out),z_i＝exp(t_i) Substituting (4) can result in:

equation (8) can be written as:

z_out＝f(∑_i∈Cw_iz_i) (9)

if z is to be_iThe activation value of the neuron in the previous layer is regarded as f is an activation function, and z is_outEquation (9) has a format consistent with the activation function of the artificial neural network for the output of the current neuron. Therefore, the back propagation algorithm can be applied to the impulse neural network training introducing the self-augmentation term, namely, an equivalent artificial neural network is constructed, the back propagation algorithm is adopted for training, the training result is used for the calculation of the impulse neural network, andthe principle of the time coding method is consistent.

And step five, taking the output of the pulse neural network as the expected future reward obtained after the robot takes each action under the current environment state, and selecting the action with the highest future reward to control the robot.

Advantageous effects

1. When the invention trains the impulse neural network by adopting a time coding method, a self-increment item is introduced into the neural model, so that all impulse neurons can be activated within limited time, the problems that the error of the network output layer cannot be effectively propagated to each hidden layer of the network and cannot participate in parameter updating are solved, and the impulse neural network can be used for the accurate control of a robot after the intensive learning training.

Drawings

FIG. 1 is a flow chart of the training steps;

FIG. 2 is a graph comparing training curves before and after the introduction of a membrane potential self-increasing mechanism.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and examples. The technical problems and the advantages solved by the technical solutions of the present invention are also described, and it should be noted that the described embodiments are only intended to facilitate the understanding of the present invention, and do not have any limiting effect.

A CartPole-v0 robot simulation environment in an OpenAI Gym reinforcement learning toolkit is used as an experimental environment, a cart is arranged in a CartPole-v0 task, a rod is erected on the cart, and the positions of the cart and the rod are random after the task is started each time. The trolley needs to move left and right to keep the rod vertical, and the following two conditions need to be met for ensuring that the task does not fail: firstly, the inclination angle of the rod cannot be larger than 15 degrees, and secondly, the moving position of the trolley needs to be kept in a certain range, and the range is set to be 4.8 unit lengths.

1) And (3) building a reinforcement learning environment by adopting a DDQN deep reinforcement learning method.

2) The artificial neural network in the DDQN deep reinforcement learning method is replaced by the impulse neural network, and the impulse neural network adopts a 3-layer structure and is an input layer, a hidden layer and an output layer respectively. The input layer is 80 pulse signal input channels and is used for receiving pulses generated by the coding method, the hidden layer is 128 neurons, the output layer is 2 neurons, and the hidden layer corresponds to two actions of the trolley respectively: move left and move right. The network adopts a full connection mode.

3) And taking the environmental state information provided by the virtual simulation environment of the robot as the input of the pulse code. The total number of the signals is 4, and the signals are respectively the position of the trolley on the track, the speed of the trolley, the included angle between the rod and the vertical direction and the change rate of the angle.

For each input signal, 20 pulse generation channels are set by adopting a space expansion method, the pulse time of the expanded 20 channels is distributed by adopting normal distribution, and the pulse generation time of each channel can be expressed as s_i,kWhere i represents the input signal, k represents the channel corresponding to the current input signal, and the value of the input signal is represented by x_iAnd (4) showing. In the Cartpoly-v 0 environment, a_i＝9.5，b_i10.9, is the minimum and maximum values that the environment provides the input signal may reach throughout the experiment, and σ is set to 1,and C is 6. The calculation of the pulse time for each channel can be expressed as:

equation (10) encodes the original input signal as a continuous pulse time distributed over 80 channels.

And taking the pulse signals generated by the 80 coded channels as input signals of the pulse neural network.

4) And (3) carrying out training based on a time coding method by adopting a pulse neural network formed by integrating without leakage current and exciting neurons. The integration of no leakage current and the dynamic model of the cell membrane potential of the excitatory neuron are as follows:

wherein, V_mem(t) is the cell membrane potential, which is a function of time t. Synaptic Current input to the right of the formula, w_iIs the weight of the synaptic connections and,for the time that the ith neuron sends the r-th pulse, κ is the calculation formula of synaptic current:

wherein, tau_synIt is set to 1 for the time constant.

Adding an auto-increment term, β exp (t), to the right end of equation (13), and updating the model of the impulse neuron as:

wherein, beta is used as a parameter for adjusting the size of the self-increment term and is set to be 0.001.

Integrating equation (13) yields:

V_ment(t_out)＝∑_i∈Cw_i(1-exp(-t_out+t_i))+0.001exp(t_out)-0.001 (14)

wherein, t_outIs the time at which the pulse is generated after the neuron is activated.C＝{i:t_i<t_outAll at t }_outOnly the previously occurring input pulses can influence t_out。V_ment(t_out) Is the threshold that the cell membrane potential needs to reach when the neuron is activated, and is set to 1.

In the formula (14)For exp (t)_out) Solving, an expression can be obtained:

the condition that the formula (15) is satisfied is that the formula (16) is satisfied, and t is_outFor time, it needs to be greater than 0, so the right end of equation (15) should be greater than 1, and equation (17) also satisfies.

(∑_i∈Cw_i-1-0.001)²>-4*0.001∑_i∈Cw_iexp(t_i) (16)

Since 0.001 sigma_i∈Cw_iexp(t_i) Above 0, equation (16) holds.

∑_i∈Cw_i(exp(t_i)-1)>-1 (18)

Equation (17) is equivalent to equation (18) due to exp (t)_i)>1, (18) is always true. Therefore, the formula (15) is always satisfied, so t_outIt is always present that every neuron in a spiking neural network will be excited for a limited time frame.

5) If z is_out＝exp(t_out),z_i＝exp(t_i) Substituting equation (15) can result in:

equation (19) can be written as:

z_out＝f(∑_i∈Cw_iz_i) (20)

if z is to be_iThe activation value of the neuron in the previous layer is regarded as f is an activation function, and z is_outFor the output of the current neuron, equation (20) and artificial neural networkThe activation functions of the networks have consistent formats, so that a back propagation algorithm can be applied to impulse neural network training introducing self-augmentation terms, namely an equivalent artificial neural network is constructed, the back propagation algorithm is adopted for training, the training result is used for updating parameters of the impulse neural network, and the principle is consistent with that of a time coding method.

6) The time of pulse generation of the neuron of the output layer of the pulse neural network is used as the expectation of future reward after the robot takes each action under the current environment state, and the smaller the time is, the larger the reward value is. And selecting the action corresponding to the neuron with the large reward value to control the robot to move left or right.

7) And under a reinforcement learning framework, training the impulse neural network by adopting a back propagation method. The capacity of a sample experience pool in the reinforcement learning training process is set to be 1000, and the number of samples to be captured in one training is set to be 32. During the progress of the task, the environment feeds back a reward value of 1 in each frame, and the reward value is continuously accumulated. When the task fails, the jackpot value is set to-1. The update frequency of the target network is every 100 steps. The accumulated reward value is used as input, the mean square error is used as a regression loss function, and an Adam method is adopted in an optimization algorithm. The learning rate is set to 0.001251.

8) The test was performed in the same environment after training. The results of the experiment are shown in FIG. 2. The result shows that in the Cartpoly-v 0 task, the impulse neural network training method provided by the invention can effectively improve the average accumulated reward obtained by the robot, namely the time for keeping the rod of the trolley upright, and the original impulse neural network training method without introducing the self-increment item cannot be applied to the reinforcement learning task.

Through the steps, the effective training of the impulse neural network under the reinforcement learning frame is realized, all neurons can be activated in a limited time range and participate in training in the training process, and the training effect of the impulse neural network is improved. The output of the impulse neural network is used as the prediction of long-term reward after each action of the robot is implemented, and can be used as the basis for selecting the action of the robot, so that the robot is effectively controlled. The method provided by the invention can be used for controlling the robot with low power consumption, low delay and high precision by combining the neuromorphic processor and the sensor.

The above detailed description is intended to illustrate the objects, aspects and advantages of the present invention, and it should be understood that the above detailed description is only exemplary of the present invention and is not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like within the spirit and principle of the present invention should be included in the scope of the present invention.

9页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：耦合神经网络有界同步及其分布式控制方法

Impulse neural network training method based on membrane potential self-increment mechanism

相关技术

网友询问留言