Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent

文档序号:87894 发布日期:2021-10-08 浏览:25次 中文

阅读说明:本技术 基于快速梯度下降的单交叉口信号控制的交通状态对抗扰动生成方法 (Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent ) 是由 徐东伟 王达 李呈斌 周磊 于 2021-07-19 设计创作,主要内容包括:一种基于快速梯度下降的单交叉口信号控制的交通状态对抗扰动生成方法,根据已有的强化学习DQN算法训练的交通路口信号灯控制模型,利用基于FGSM攻击并结合梯度值大小对对抗扰动进行离散化处理生成对抗样本,将对抗扰动与原始状态相结合得到最终的扰动状态输入到智能体模型中,最后在sumo上对单交叉路口的流畅或拥堵程度检验效果。本发明可以在使输出的扰动具有物理意义的情况下限制扰动的大小,从而高效的生成对抗状态,增加路口的排队长度和等待时间,大幅降低模型的性能,使交通路口流通度大大降低。(A traffic state countermeasure disturbance generation method based on single intersection signal control of rapid gradient descent is characterized in that according to a traffic intersection signal lamp control model trained by an existing reinforcement learning DQN algorithm, countermeasure samples are generated by utilizing FGSM attack based and discretization processing on the countermeasure disturbance in combination with gradient values, the final disturbance state obtained by combining the countermeasure disturbance and an original state is input into an intelligent model, and finally the effect of fluency or congestion degree inspection on a single intersection is achieved on sumo. The invention can limit the disturbance under the condition that the output disturbance has physical significance, thereby efficiently generating a confrontation state, increasing the queuing length and waiting time of the intersection, greatly reducing the performance of the model and greatly reducing the traffic intersection circulation.)

1. A traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent is characterized by comprising the following steps:

step 1: training an reinforcement learning intelligent agent model on a single intersection road grid, wherein network parameters of the model do not change after training, the model has high mobility, and high fluency and no congestion are realized in a single intersection testing process;

step 2: acquiring the number of vehicles at the input end of each intersection and the positions of the vehicles at the input end of each intersection, namely inputting the number and the positions of the vehicles into a model, generating corresponding traffic signal lamps, namely output actions, and attacking the input at each moment one by utilizing a FGSM (highway fault warning message) attack algorithm to obtain corresponding counterdisturbance;

and step 3: discretizing the generated countermeasure disturbance, and combining the generated countermeasure disturbance with the originally acquired traffic flow to obtain a final disturbance state, namely the number and the positions of the vehicles at the traffic intersection input into the model at the moment;

and 4, step 4: in the currently constructed disturbance state, the disturbance magnitude is limited, and when the disturbance amount is smaller than the disturbance limit, the disturbance state is input into the model; inputting the original state into the model when the disturbance amount is larger than the disturbance limit;

and 5: and finally, comparing fluency of traffic intersections by traffic light phases obtained by traffic flows in different input states on sumo.

2. The method for generating the disturbance-resistant traffic state based on the rapid gradient descent single intersection signal control as claimed in claim 1, wherein in the step 1, the single intersection is an intersection, firstly, a reinforcement learning intelligent agent model is trained on a single intersection road grid, the traffic state on all roads entering the single intersection is discretely coded, and the length of the single intersection from a road section entrance to a stop line is lThe road k (k is 1,2,3,4) is equally divided into c discrete units, and the vehicle position of the road k at the single intersection at the time t is represented as a vehicle position matrix sk(t) when the vehicle head is on a discrete cell, then the vehicle position matrix sk(t) has a value of 0.5 for the ith (i ═ 1,2, …, c) position, otherwise the value is-0.5, and the formula is:

whereinRepresenting a vehicle position matrix sk(t) value of ith position, matrix s of vehicle positions of four intersection input ends at time tk(t) splicing according to line head and tail to form stThe formula is expressed as:

st=[s1(t),s2(t),s3(t),s4(t)] (2)

then handle stInputting the state of the environment into an intelligent agent model for training, and outputting a corresponding action, namely a phase to be executed by a traffic light by the intelligent agent;

defining the phase of traffic light as motion space a ═ a1,a2,a3,a4In which a is1Green light in east-west direction, a2Turning green light to the left in east and west directions, a3Green light in the north-south direction, a4Turning green light to the left in south and north, setting a during operationiThe initial time length of the phase is m, the phase time length of the yellow lamp is n, and the current state s is compared with the current state s at the moment ttInputting the phase into an intelligent traffic light model, and selecting the phase a by the intelligent traffic lighti(i ═ 1,2,3,4) when aiAfter the phase is executed, the intelligent traffic light collects the state s at the moment of t +1 from the environmentt+1Then selects phase aj(j ═ 1,2,3,4) if ai≠ajThen aiThe phase execution time is not lengthened any more, i.e. aiEnd of phase at aiIntelligent traffic after phase completionThe lamp executes the yellow lamp phase, and after the yellow lamp phase is finished, executes ajA phase; if ai=ajThen a isiPhase execution time is prolonged by m; will award rtSet as the difference in the waiting time of the crossing vehicle between two consecutive actions, the formula is expressed as:

rt=Wt-Wt+1 (3)

wherein Wt,Wt+1Respectively waiting time of entering all lanes of the single intersection at the time t and the time t +1, judging the action according to the executed action and environment reward, and continuously updating the parameters of the network, wherein the used reinforcement learning model is DQN, and the structure comprises a convolution layer and a full connection layer; the parameters comprise the size of a convolution kernel and the number of neurons of a full connection layer, a deep neural network is used as a Q value network, network parameters are initialized, the output of the network is the Q value, a Relu nonlinear activation function is adopted by a hidden layer, the number of the neurons of the output layer is equal to the size of an action space of a single intersection, and the formula is expressed as follows:

Q=h(wst+b) (4)

where w represents the weight of the neural network, stFor the input of the network, b is the bias, h (.) denotes the Relu activation function, the loss function of DQN is:

Lt=(yt-Q(st,ai;θ′))2 (6)

wherein y istRepresents a target value, ai,ajE A represents the traffic light phase r which is the action of the intelligent outputtRepresenting the reward at the moment T, gamma is the learning rate, theta and theta 'represent the parameters w and b of the target network and the parameters w and b' of the estimation network in the DQN respectively, the parameters of the estimation network are updated gradually along with the time step, and the parameters of the target network are updated by directly copying the network from the estimation network at intervals of time TThe parameters, formula, are:

3. the method for generating the traffic state anti-disturbance based on the single intersection signal control with the rapid gradient descent according to the claim 1 or 2, characterized in that the process of the step 2 is as follows:

2.1: obtaining an input value s of an input model at time ttWherein s istRepresenting the number of vehicles at the input end of the single intersection and the positions of the vehicles at the input end of the single intersection, which are obtained from sumo at the moment t;

2.2: input original state stSelecting the action a with the maximum action value function Q through the trained DQN intelligent agent modelm(m ═ 1,2,3,4) which is the optimum traffic light phase at this time, the formula is given as:

where θ represents a parameter of the trained intelligent agent model network, amIndicating the phase of the output, i.e. the traffic light is to be executed;

2.3: using FGSM attack algorithm, assigning values along gradient direction according to sign function to generate corresponding anti-disturbance eta at t momenttThe formula is expressed as:

where ε represents the perturbation coefficient, stRepresenting the input value, i.e. the position of the vehicle, amRepresenting the traffic light at that timeOptimum phase of the line, sign stands for sign function, Lt(θ,st,am) Representing the loss function of the model at time t.

4. The method for generating the traffic state anti-disturbance based on the single intersection signal control with the rapid gradient descent according to the claim 1 or 2, characterized in that the process of the step 3 is as follows:

3.1:wherein c is the number of discrete units divided by the input end of the traffic intersection,representing the countermeasure disturbance of the ith discrete unit at the time t, and calculating the countermeasure disturbance eta at the time ttThen, taking the absolute value of the disturbance at the time t and finding the maximum value thereofAnd minimum valueAnd ordering eta according to the size sequence to obtain a new ordering arrayFinally pass throughDiscretizing the disturbance to make the disturbance have practical physical significance;

3.2: at etatReading the disturbance in sequence and comparing the disturbance with original data, and if the original state is inconsistent with the counterdisturbance, assigning the corresponding disturbance to the corresponding original state; if the original state is consistent with the counterdisturbance, then get eta againt' the next countermeasure disturbance is assigned as described above until the selected disturbance is valid, resulting in a disturbanceDynamic state st′。

5. The method for generating the traffic state anti-disturbance based on the single intersection signal control with the rapid gradient descent according to the claim 1 or 2, characterized in that the process of the step 4 is as follows: calculating the disturbance quantity mu added by the disturbance state at the moment ttThe formula is expressed as:

where len (.) denotes the calculation stAnd stThe number of' middle vehicle state is 0.5, when the disturbance amount mutWhen the value is less than or equal to delta, the state of disturbance s is detectedt' input into the agent model, otherwise the original state stInput into the agent model.

6. The method for generating the traffic state anti-disturbance based on the single intersection signal control with the rapid gradient descent according to the claim 1 or 2, characterized in that the process of the step 5 is as follows:

5.1: original state s at each timetModel input into the model selects the optimal actionControlling the traffic flow at the intersection and calculating the waiting time difference of the traffic intersection, i.e. the reward rt=Wt-Wt+1

5.2: for the final disturbance s after adding effective disturbancet' calculating the disturbance quantity mutTo meet the requirement (mu)tδ) input of input state into the agent model and output actionI.e. the traffic light phase, when the difference in waiting time (reward r) at the traffic crossing is calculated as wellt=Wt-Wt+1)。

Technical Field

The invention belongs to the crossing field of intelligent traffic and machine learning information safety, and relates to a traffic state anti-disturbance generation method based on single crossing signal control of fast gradient descent (FGSM).

Background

The problem of traffic congestion becomes an urgent challenge for urban traffic, and when a modern city is designed, one of the most critical considerations is to develop an intelligent traffic management system. The main goal of traffic management systems is to reduce traffic congestion, which has become one of the major problems in large cities today. Efficient urban traffic management can save time and money and reduce carbon dioxide emissions to the atmosphere.

Reinforcement Learning (RL) has produced impressive results as a machine learning technique for traffic signal control problems. Reinforcement learning does not require prior full understanding of the environment, such as traffic flow. Instead, they can acquire knowledge and model environmental dynamics by interacting with the environment. After each operation is performed in the environment, it will receive a scalar reward. The reward earned depends on the degree of action taken and the goal of the agent is to learn the best control strategy, so by repeatedly interacting with the environment, the accumulated reward at a reduced price can be maximised. Deep Reinforcement Learning (DRL) has numerous applications in the real world due to its excellent ability to adapt quickly to the surrounding environment. Although DRL has great advantages, it is vulnerable to antagonistic attacks such as: luring attacks, strategy timing attacks, value function-based counter attacks, trojan attacks, and the like.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a traffic state anti-disturbance generation method based on single intersection signal control with rapid gradient descent, which can add a small amount of disturbance to the number of vehicles and the positions of the vehicles and ensure that the disturbance has actual physical significance, thereby efficiently generating the anti-disturbance and greatly reducing the performance of a model and the smoothness degree of a traffic intersection.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent comprises the following steps:

step 1: training a reinforcement learning Deep Q Network (DQN) intelligent model on a single intersection road grid, wherein Network parameters of the model do not change after training, the model has high mobility, and high fluency and no congestion are embodied in a single intersection testing process;

step 2: acquiring the number of vehicles at the input end of each intersection and the positions of the vehicles at the input end of each intersection, namely inputting the number and the positions of the vehicles into a model, generating corresponding traffic signal lamps, namely output actions, and attacking the input at each moment one by utilizing a FGSM (highway fault warning message) attack algorithm to obtain corresponding counterdisturbance;

and step 3: discretizing the generated countermeasure disturbance, and combining the generated countermeasure disturbance with the originally acquired traffic flow to obtain a final disturbance state, namely the number and the positions of the vehicles at the traffic intersection input into the model at the moment;

and 4, step 4: in the currently constructed disturbance state, the disturbance magnitude is limited, and when the disturbance amount is smaller than the disturbance limit, the disturbance state is input into the model; inputting the original state into the model when the disturbance amount is larger than the disturbance limit;

and 5: and finally, comparing fluency of traffic intersections by traffic light phases obtained by traffic flows in different input states on sumo.

As a research hotspot in the field of artificial intelligence, Deep Reinforcement Learning (DRL) has achieved certain success in various fields such as robot control, computer vision, intelligent transportation, and the like. Meanwhile, the possibility of attack and whether it has strong resistance are also the hot topics in recent years. Therefore, the method selects a representative Deep Q Network (DQN) algorithm in Deep reinforcement learning, controls single intersection signal lamps to be application scenes, and adopts a fast gradient descent method (FGSM) to attack the DQN algorithm to generate countermeasure samples.

The technical conception of the invention is as follows: according to a traffic intersection signal lamp control model trained by the existing reinforcement learning DQN algorithm, the countermeasure disturbance is discretized by using FGSM attack and combining with the gradient value to generate a countermeasure sample, the countermeasure disturbance is combined with the original state to obtain a final disturbance state, the final disturbance state is input into an intelligent model, and finally the effect of detecting the smoothness or the congestion degree of a single-crossing intersection on sumo is achieved.

The invention has the following beneficial effects: and generating corresponding opposite disturbance to the maximum gradient value by using an FGSM attack algorithm, wherein the generated disturbance is a discrete value, combining the opposite disturbance and the original traffic flow to form a disturbance state, adding disturbance limitation to the disturbance quantity of the disturbance state, and obtaining the output which is the disturbance state. The invention can limit the disturbance under the condition that the output disturbance has physical significance, thereby efficiently generating a confrontation state, increasing the queuing length and waiting time of the intersection, greatly reducing the performance of the model and greatly reducing the traffic intersection circulation.

Drawings

Fig. 1 is a schematic diagram of reinforcement learning.

Fig. 2 is a general flow diagram of FGSM generation against perturbation.

Fig. 3 is a schematic view of a single intersection.

Fig. 4 is a discrete state of the vehicle position.

FIG. 5 is a comparison graph of single intersection vehicle waiting queue lengths.

FIG. 6 is a comparison graph of vehicle waiting times at a single intersection.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 to 6, a traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent includes the following steps:

step 1: reinforcement learning is an algorithm that interacts with the environment continuously, as shown in fig. 1. The reinforcement learning algorithm contains three most basic elements: environmental status, agent actions, environmental rewards. Take a typical crossroad as an example. Firstly, training a reinforcement learning intelligent agent model on a road grid of a single intersection, and carrying out discrete coding on traffic states of all roads entering the single intersection. Equally dividing a single intersection into c discrete units from a road k (k is 1,2,3 and 4) with the length of l between a road section entrance and a stop line, and expressing the vehicle position of the road k of the single intersection at the time t as a vehicle position momentArray sk(t) when the vehicle head is on a discrete cell, then the vehicle position matrix sk(t) has a value of 0.5 for the ith (i ═ 1,2, …, c) position, otherwise the value is-0.5, and the formula is:

whereinRepresenting a vehicle position matrix sk(t) value of ith position, matrix s of vehicle positions of four intersection input ends at time tk(t) splicing according to line head and tail to form stThe formula is expressed as:

st=[s1(t),s2(t),s3(t),s4(t)] (2)

then handle stThe state of the environment is input into the intelligent agent model for training, and the intelligent agent outputs corresponding actions, namely phases (such as south-north green light or east-west green light) to be executed by the traffic light.

A typical crossroad will be described as an example. We define the phase of the traffic light as the motion space a ═ a1,a2,a3,a4In which a is1Green light in east-west direction, a2Turning green light to the left in east and west directions, a3Green light in the north-south direction, a4Turning green light to the left from north and south. Setting a at runtimeiThe initial duration of the yellow phase is m and the yellow phase duration is n. At time t, the current state stInputting the phase into an intelligent traffic light model, and selecting the phase a by the intelligent traffic lighti(i ═ 1,2,3,4) when aiAfter the phase is executed, the intelligent traffic light collects the state s at the moment of t +1 from the environmentt+1Then selects phase aj(j ═ 1,2,3, 4). If ai≠ajThen aiThe phase execution time is not lengthened any more, i.e. aiEnd of phase at aiAfter the phase is finished, the intelligent traffic light executes the yellow light phase, and after the yellow light phase is finished, executes ajA phase; if ai=ajThen a isiPhase execution time is prolonged by m; will award rtSet as the difference in the waiting time of the crossing vehicle between two consecutive actions, the formula is expressed as:

rt=Wt-Wt+1 (3)

wherein Wt,Wt+1The waiting time for entering all lanes of the single intersection at the time t and the time t +1 are respectively. And judging the action according to the executed action and the environment reward, thereby continuously updating the parameters of the network. The reinforcement learning model used was: deep Q Network (DQN). The structure comprises a convolution layer and a full connection layer; the parameters include the convolution kernel size, the number of full connectivity layer neurons. A deep neural network is used as a Q value network, network parameters are initialized, the output of the network is the Q value, a Relu nonlinear activation function is adopted by a hidden layer, and the number of neurons of an output layer is equal to the size of an action space of a single intersection. The formula is expressed as:

Q=h(wst+b) (4)

where w represents the weight of the neural network, stIs the input to the network, b is the bias, and h (.) represents the Relu activation function. The loss function of DQN is:

Lt=(yt-Q(st,ai;θ′))2 (6)

wherein y istRepresents a target value, ai,ajE A represents the traffic light phase r which is the action of the intelligent outputtRepresenting the reward at the moment T, gamma is a learning rate, theta and theta 'represent parameters w and b of a target network and parameters w and b' of an estimation network in the DQN respectively, the parameters of the estimation network are updated gradually along with a time step, the parameters of the target network are updated by directly copying the parameters of the network from the estimation network at intervals of time T, and the formula is as follows:

step 2: the number of vehicles at the input end of each intersection and the positions of the vehicles at the input end of each intersection are obtained at the traffic intersections, namely the number and the positions of the vehicles are input into the model, and corresponding traffic lights, namely output actions, are generated. Using FGSM attack algorithm to attack the input of each time one by one to obtain corresponding anti-disturbance; the process is as follows:

2.1: obtaining an input value s of an input model at time ttWherein s istRepresenting the number of vehicles at the input end of the single intersection and the positions of the vehicles at the input end of the single intersection, which are obtained from sumo at the moment t;

2.2: input original state stSelecting the action a with the maximum action value function Q through the trained DQN intelligent agent modelm(m ═ 1,2,3,4) which is the optimum traffic light phase at this time, the formula is given as:

where θ represents a parameter of the trained intelligent agent model network, amIndicating the action of the output, i.e. the phase the traffic light is to perform.

2.3: using FGSM attack algorithm, assigning values along gradient direction according to sign function to generate corresponding anti-disturbance eta at t momenttThe formula is expressed as:

where ε represents the perturbation coefficient, stRepresenting the input value, i.e. the position of the vehicle, amRepresenting the optimum phase for traffic light execution at that time, sign representing the sign functionNumber, Lt(θ,st,am) Representing the loss function of the model at time t.

And step 3: the state taken is a discrete value since it is the number of vehicles and their positions. Therefore, the counterdisturbance η is processed to obtain a disturbance value with actual physical significance. The process is as follows:

3.1:wherein c is the number of discrete units divided by the input end of the traffic intersection,representing the confrontation state of the ith discrete cell at time t. Calculating the resistance disturbance eta of t momenttThen, taking the absolute value of the disturbance at the time t and finding the maximum value thereofAnd minimum valueAnd ordering eta according to the size sequence to obtain a new ordering arrayFinally pass throughAnd discretizing the disturbance to make the disturbance have practical physical significance.

3.2: at etatReading the disturbance in sequence and comparing the disturbance with original data, and if the original state is inconsistent with the counterdisturbance, assigning the corresponding disturbance to the corresponding original state; if the original state is consistent with the counterdisturbance, then get eta againt' the next countermeasure disturbance is assigned in the manner described above until the selected disturbance is valid, resulting in a disturbance state st′。

And 4, step 4: in the currently constructed disturbance state, the disturbance magnitude is limited, and the disturbance amount is limited at the moment tμtInputting the disturbance state into the model when the disturbance state is less than or equal to delta (delta is a disturbance limit); when the disturbance amount mutInputting the original state into the model when the value is less than or equal to delta;

calculating the disturbance quantity mu added by the disturbance state at the moment ttThe formula is expressed as:

where len (.) denotes the calculation stAnd stThe number of' middle vehicle state is 0.5, when the disturbance amount mutWhen the value is less than or equal to delta, the state of disturbance s is detectedt' input into the agent model, otherwise the original state stInput into the agent model.

And 5: and testing the performance of the generated anti-disturbance, and after the state is input into the model, the intelligent agent can select the phase of the traffic signal lamp according to the current state to control the traffic flow of the single intersection. Finally, comparing the fluency of the traffic intersection by the traffic light phases obtained by the traffic flows in different input states on sumo;

the process of the step 5 is as follows:

5.1: original state s at each timetModel input into the model will select the optimal action (traffic light phase)Controlling the traffic flow at the intersection and calculating the waiting time difference (reward r) of the traffic intersectiont=Wt-Wt+1)。

5.2: for the final disturbance s after adding effective disturbancet' calculating the disturbance quantity mutTo meet the requirement (mu)tδ) input of input state into the agent model and output actionI.e. the traffic light phase, when the difference in waiting time (reward r) at the traffic crossing is calculated as wellt=Wt-Wt+1)。

Example (c): the data in the actual experiment are as follows:

(1) selecting experimental data

The experimental data are 100 cars randomly generated by a single intersection on sumo, and the size of each car, the distance from the generation position to the intersection and the speed of the car from generation to passing through the intersection are all the same. The initial time of traffic light phase at the traffic intersection is 10 seconds for green light and 4 seconds for yellow light. A road k (k 1,2,3,4) of 700 meters in length starting from the stop line is divided into 100 discrete units of 7 meters in length. Original state s collected at input end of traffic intersectiontThe number and the positions of the vehicles at the input end of the single intersection are recorded. The perturbation limit δ is 20%.

(2) Results of the experiment

In result analysis, a single intersection is used as an experimental scene, a reinforcement learning Deep Q Network (DQN) intelligent model is trained, a fast gradient descent method (FGSM) is adopted, discretization processing is carried out on disturbance to generate counterdisturbance, the number of vehicles at the input end of the intersection and the positions thereof are changed to cause the phase of the traffic lights to be changed, the comparison experiment is carried out under the two conditions of attack and non-attack, and the experimental results are shown in fig. 5 and fig. 6 (when the attack is continuously carried out, the traffic light phase cannot well ensure the circulation of the single intersection vehicles, so that the vehicles are accumulated at the intersection.

The embodiments described in this specification are merely illustrative of implementations of the inventive concepts, which are intended for purposes of illustration only. The scope of the present invention should not be construed as being limited to the particular forms set forth in the examples, but rather as being defined by the claims and the equivalents thereof which can occur to those skilled in the art upon consideration of the present inventive concept.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种动态调整红绿灯时间的控制系统及方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!