Internet of vehicles node forwarding utility learning method based on double updating strategies

文档序号:1908192 发布日期:2021-11-30 浏览:21次 中文

阅读说明:本技术 一种基于双更新策略的车联网节点转发效用学习方法 (Internet of vehicles node forwarding utility learning method based on double updating strategies ) 是由 王桐 王希波 刘逸伦 高山 曹越 于 2021-08-03 设计创作,主要内容包括:本发明是一种基于双更新策略的车联网节点转发效用学习方法。本发明涉及移动机会网络通信技术领域,基于车辆节点间信息交互过程中的信息更新,确定学习过程中基本要素;确定节点接触新鲜度系数,确定节点接触概率,建立车载机会网络中节点转发先用学习模型;根据车载机会网络的路由需求和节点机会接触特性,确定转发效用学习更新模型,建立转发效用值数据包转发更新策略,获取发送节点从上一节点中接收该数据包的时刻,建立转发效用值节点接触更新策略,在更新过程中采用与转发更新过程不同的学习系数。本发明提高车载机会网络的传输性能,包括提高数据包投递成功率和降低数据包传输时延。(The invention discloses a vehicle networking node forwarding utility learning method based on a double updating strategy. The invention relates to the technical field of mobile opportunistic network communication, and the basic elements in the learning process are determined based on information updating in the information interaction process between vehicle nodes; determining a node contact freshness coefficient, determining a node contact probability, and establishing a node forwarding first-use learning model in the vehicle-mounted opportunity network; determining a forwarding utility learning updating model according to the routing requirement and the node opportunity contact characteristics of the vehicle-mounted opportunity network, establishing a forwarding utility value data packet forwarding updating strategy, acquiring the moment when a sending node receives the data packet from the previous node, establishing a forwarding utility value node contact updating strategy, and adopting a learning coefficient different from that in the forwarding updating process. The invention improves the transmission performance of the vehicle-mounted opportunity network, including improving the delivery success rate of the data packet and reducing the transmission delay of the data packet.)

1. A vehicle networking node forwarding utility learning method based on a double updating strategy is characterized by comprising the following steps: the method comprises the following steps:

step 1: determining basic elements in a learning process based on information updating in an information interaction process between vehicle nodes;

step 2: determining a node contact freshness degree coefficient based on information updating in the information interaction process among the vehicle nodes,

and step 3: determining the node contact probability, and establishing a node forwarding first-use learning model in the vehicle-mounted opportunity network;

and 4, step 4: determining a forwarding utility learning updating model according to the routing requirement of the vehicle-mounted opportunity network and the opportunity contact characteristics of the nodes, wherein the model comprises a dynamic discount factor;

and 5: establishing a forwarding updating strategy of a forwarding utility value data packet, acquiring the moment when a sending node receives the data packet from the previous node, determining the time length between two nodes, and bringing the time length into an updating model to determine a state-action value

Step 6: and establishing a forwarding utility value node contact updating strategy, and adopting a learning coefficient different from that in the forwarding updating process in the updating process.

2. The Internet of vehicles node forwarding utility learning method based on the double updating strategy as claimed in claim 1, wherein: the step 1 specifically comprises the following steps:

determining basic elements required in a learning process, the elements comprising: environment, agent, state space, action space, and immediate reward; defining a node update information table, wherein the node update information table comprises a node contact information table and a node state-action value table;

the environment is that in the process of delivering the data packet from the source node to the destination node, the whole vehicle-mounted opportunity network in the city provides required information along with the forwarding of the data packet, and the vehicle-mounted opportunity network is regarded as the environment of a learning model;

the intelligent agent is an intelligent agent which takes a data packet transmitted from a source node to a destination node as a learning algorithm;

the state space is a storage node of a data packet which is formed by all vehicle nodes in the network, and the set of all nodes in the network is the state space of the intelligent agent;

the action space is an action space of an intelligent agent formed by forwarding a data packet to a next hop node by a node, in the vehicle-mounted opportunity network, the node has a storage-carrying-forwarding function, and the selection range of the forwarding node comprises all contacted nodes;

the immediate report means that after the data packet is successfully forwarded to the next hop node, the intelligent agent obtains an immediate report value from the environment for updating the state-action value;

the node contact information table is used for updating contact information when information interaction is carried out on contact between nodes, and the average contact interval and the contact freshness coefficient between the node s and other nodes are calculated through the node contact information, so that the contact probability between the nodes is estimated;

the node state-action value table is an accumulated return value which can be obtained by taking the corresponding node as a next-hop delivery node, and when the return value is larger, the transmission performance of the selected node as the next-hop delivery node for the data packet is better.

3. The Internet of vehicles node forwarding utility learning method based on the double updating strategy as claimed in claim 1, wherein: the step 2 specifically comprises the following steps:

coefficient of contact freshness FA,BSetting a contact freshness coefficient F for the freshness degree of contact information between the nodes A and B and representing the timeliness strength of the current contact probability, and when the nodes A and B are not in contactA,BIs equal to zero; after nodes a and B establish a link, the formula is updated by:

FA,B=FA,B+(1-FA,B)*Pint

wherein, PintIs a fixed constant, set PintEqual to 0.85;

when the nodes A and B are not in contact for a long time, the freshness of the contact information between the two nodes is reduced, the freshness coefficient of the contact of the nodes which are not in contact for a long time needs to be attenuated, and the attenuation is updated according to the following formula:

wherein eta is an attenuation factor and takes a value of 0.95; mu.sA,BThe number of time units elapsed from the last contact disconnection time of the nodes a and B is shown, and the length of the time unit is the average contact interval time of the nodes a and B.

4. The Internet of vehicles node forwarding utility learning method based on the double updating strategy as claimed in claim 3, wherein: the step 3 specifically comprises the following steps:

step 3.1: determining node contact probability, and estimating the contact probability between nodes by approximately following negative exponential distribution to the contact interval time between vehicle nodes in the city, wherein the contact probability is expressed by the following formula:

wherein, PA,B(T) represents the probability of contact of node A and node B within time T, θA,BRepresents the mean of the negative exponential distribution of the contact intervals of the nodes A and B;

step 3.2: the distribution mean of the contact interval exponential distribution is estimated using the statistical average of the node contact intervals, and then the probability of contact of nodes a and B over time T is represented by:

wherein the content of the first and second substances,n denotes the number of contacts of nodes A and B, t1At the moment of first contact, t2i+1At the contact start time of the (i + 1) th time2iThe moment of disconnection of the ith contact;

step 3.3: after introducing the contact freshness degree coefficient, the contact probability of the nodes A and B is expressed by the following formula:

5. the Internet of vehicles node forwarding utility learning method based on the double updating strategy as claimed in claim 1, wherein: the step 4 specifically comprises the following steps:

determining a forwarding utility learning updating model according to the routing requirement of the vehicle-mounted opportunity network and the opportunity contact characteristics of the nodes, wherein the model comprises a dynamic discount factor, a function is reported immediately, and the node contact probability is introduced into the updating model;

immediate return value Rd(s, x) is represented by the following formula:

whereinThe time length of a data packet with a destination node d from an entering node s to a forwarding entering node x is represented;

dynamic discount factor gammad(s, x) is represented by the following formula:

where γ is the constant of the discount factor, 0<γ≤1;The time length of a data packet with a destination node d from an entering node s to a forwarding entering node x is represented;

the forwarding utility Q value update formula is shown by:

wherein Q isd(s, x) selecting the node x as a state-action value of a next hop forwarding node in the node s for the data packet with the destination node d, namely forwarding a forwarding utility Q value corresponding to the data packet with the destination node d from s to x; alpha is a learning coefficient, and alpha is more than or equal to 0 and less than or equal to 1; rd(s, x) selecting the node x as an immediate return value of a next hop forwarding node in the node s for the data packet with the destination node d; gamma rayd(s, x) is a dynamic discount factor corresponding to the data packet of which the destination node is d is forwarded to the node x in the node s; n is a radical ofxA set of contact nodes representing nodes, the set containing all nodes encountered during the movement of all nodes x; qd' (x, y) isAnd adapting the state-action value of the node contact probability introduced aiming at the dynamic change characteristic of the vehicle opportunity network.

6. The Internet of vehicles node forwarding utility learning method based on the double updating strategy as claimed in claim 5, wherein: the step 5 specifically comprises the following steps:

in the vehicle-mounted opportunity network, after the data packet is successfully forwarded, the node receiving the data packet sends receiving confirmation information to the node sending the data packet; when a node sending data receives a data sending confirmation, extracting an ID of a receiving node, an ID of a target node corresponding to a data packet, a Time of receiving the data packet and a state-action value of the maximum contact probability of the corresponding data packet in the receiving node; and calculating the time length of the data packet between the two nodes by acquiring the time when the sending node receives the data packet from the last node, and substituting the time length into the updating formula to calculate the state-action value.

7. The Internet of vehicles node forwarding utility learning method based on the double updating strategy as claimed in claim 6, wherein: the step 6 specifically comprises the following steps: on one hand, the contact information among the nodes is updated through the sending of the node contact interaction information, and the updating comprises the updating of contact time, the updating of contact times, the updating of accumulated contact interval duration and the updating of a contact freshness coefficient among the nodes, so that the calculation of the contact probability among the nodes is realized;

the node contact updating of the state-action value is realized by acquiring Q value list information contained in the contact interaction information, the node data packet transmission duration used by an immediate return value function and a discount factor function is replaced by an average value of the data packet transmission duration between nodes in the node contact updating process, which is different from a forwarding updating process, and a learning coefficient different from the forwarding updating process is adopted in the updating process.

Technical Field

The invention relates to the technical field of mobile opportunistic network communication, in particular to a vehicle networking node forwarding utility learning method based on a double-updating strategy.

Background

The development of industrial automation is wave to push the development of high and new technologies such as information sensing, data communication and data processing to advance continuously, a large number of intelligent devices with information sensing and processing capabilities and short-distance information wireless transmission capabilities are applied to numerous fields such as urban intelligent traffic, marine environment monitoring and wild animal migration and tracking, and social development gradually enters the information era of the internet of things. In order to meet the ubiquitous interconnection and comprehensive perception requirements of the internet of things/the internet of vehicles, networking interconnection needs to be carried out among intelligent devices, and therefore networking technology among the devices increasingly becomes the focus of the research field of the internet of things. In the practical application of the high-dynamic self-organizing network/vehicle networking, the problems of sparse node distribution, fast network topology change and the like in a city are often met, and the connectivity of the network cannot be guaranteed, so that the traditional mobile self-organizing network communication protocol is not suitable for the complex scenes any more. Because the condition applied by the traditional communication protocols is that the number of end-to-end links which are completely communicated between any node pair in the network cannot be less than one, and the condition is difficult to meet in the actual self-organizing network, the transmission performance of the network is difficult to ensure, so that the car networking is difficult to popularize in the practical application.

The vehicle-mounted mobile opportunity network/internet of vehicles is introduced into a Bundle Layer (Bundle Layer) between an application Layer and a transport Layer on the basis of the original five-Layer network architecture, as shown in the attached figure 1. The bundle layer enables the original 'store-and-Forward' (store and Forward) data communication of network nodesThe information mode is converted into a storage-carrying-forwarding (Store-Carry-Forward) communication mode, the disadvantage of dynamic change of network topology is converted into an applicable characteristic, and a relay node is selected to Forward a data packet by means of opportunity contact generated by movement of a vehicle node until the data packet reaches a destination node. Fig. 2 shows a network data packet transmission process in a vehicle mobile opportunity. The entire process of packet generation from node S and then delivery to D. Suppose at T1At the moment, a data packet with a destination node D is generated on a node S, a complete end-to-end link does not exist between the two nodes, and no proper adjacent node is selected as a relay node in the transmission range of the node S, so that the node S continues to carry the data packet to move in the network; at T2At the moment, the node S meets the node R, and the node R has larger transmission potential, so that the node S forwards the data packet to the node R, and the node R carries the data packet to move in the network; at T3At that time, node R moves into the communication area of destination node D, so R transmits the packet to node D, completing the data transfer task.

For the vehicle-mounted mobile opportunity network, selecting a proper relay node to carry the data packet is critical to the performance of network transmission. In the relay node selection process, the role of a reasonable and effective forwarding node utility calculation method is particularly important to be customized according to the network characteristics and the node characteristics of the vehicle-mounted mobile opportunistic network.

Disclosure of Invention

The invention updates the reinforcement learning state-action value by using the information interaction between vehicle nodes (information interaction generated by the transmission of data packets between the nodes and information interaction generated by the contact of the nodes in the network), so that the network nodes can gradually acquire the forwarding utility of the nodes to the data packets along with the learning process of the reinforcement learning, and the transmission performance of the mobile opportunistic network is improved. In the vehicle-mounted opportunity network, vehicles communicate through vehicle-mounted WIFI, Bluetooth or short-range special communication equipment to realize vehicle-to-vehicle communication. The invention provides a vehicle networking node forwarding utility learning method based on a double-updating strategy, which provides the following technical scheme:

a vehicle networking node forwarding utility learning method based on a double-updating strategy comprises the following steps:

step 1: determining basic elements in a learning process based on information updating in an information interaction process between vehicle nodes;

step 2: determining a node contact freshness degree coefficient based on information updating in the information interaction process among the vehicle nodes,

and step 3: determining the node contact probability, and establishing a node forwarding first-use learning model in the vehicle-mounted opportunity network;

and 4, step 4: determining a forwarding utility learning and updating model according to the routing requirement of the vehicle-mounted opportunity network and the opportunity contact characteristics of the nodes, wherein the model comprises a dynamic discount factor;

and 5: establishing a forwarding updating strategy of a forwarding utility value data packet, acquiring the moment when a sending node receives the data packet from the previous node, determining the time length between two nodes, and bringing the time length into an updating model to determine a state-action value

Step 6: and establishing a forwarding utility value node contact updating strategy, and adopting a learning coefficient different from that in the forwarding updating process in the updating process.

Preferably, the step 1 specifically comprises:

determining basic elements required in a learning process, the elements comprising: environment, agent, state space, action space, and immediate reward; defining a node update information table, wherein the node update information table comprises a node contact information table and a node state-action value table;

the environment is an environment which takes the vehicle opportunity network as a learning model in the process of delivering the data packet from the source node to the destination node and provides required information along with the forwarding of the data packet in the whole vehicle opportunity network in the city;

the intelligent agent is an intelligent agent which takes a data packet transmitted from a source node to a destination node as a learning algorithm;

the state space is a storage node of a data packet which is formed by all vehicle nodes in the network, and the set of all nodes in the network is the state space of the intelligent agent;

the action space is an action space of an intelligent agent formed by forwarding a data packet to a next hop node by a node, in the vehicle-mounted opportunity network, the node has a storage-carrying-forwarding function, and the selection range of the forwarding node comprises all contacted nodes;

the immediate report means that after the data packet is successfully forwarded to the next hop node, the intelligent agent obtains an immediate report value from the environment for updating the state-action value;

the node contact information table is used for updating contact information when information interaction is carried out on contact between nodes, and the average contact interval and the contact freshness coefficient between the node s and other nodes are calculated through the node contact information, so that the contact probability between the nodes is estimated;

the node state-action value table is an accumulated return value which can be obtained by taking the corresponding node as a next-hop delivery node, and when the return value is larger, the node is selected as the next-hop delivery node, so that the transmission performance of the data packet is better.

Preferably, the step 2 specifically comprises:

coefficient of contact freshness FA,BSetting a contact freshness coefficient F for the freshness degree of contact information between the nodes A and B and representing the timeliness strength of the current contact probability, and when the nodes A and B are not in contactA,BIs equal to zero; after nodes a and B establish a link, the formula is updated by:

FA,B=FA,B+(1-FA,B)*Pint

wherein, PintIs a fixed constant, set PintEqual to 0.85;

when the nodes A and B are not in contact for a long time, the freshness of the contact information between the two nodes is reduced, the freshness coefficient of the contact of the nodes which are not in contact for a long time needs to be attenuated, and the attenuation is updated according to the following formula:

wherein eta is an attenuation factor and takes a value of 0.95; mu.sA,BTo representThe number of time units which are away from the last contact disconnection time of the nodes A and B, and the length of the time unit is the average contact interval time of the nodes A and B.

Preferably, the step 3 specifically comprises:

step 3.1: determining node contact probability, and estimating the contact probability between nodes by approximately following negative exponential distribution to the contact interval time between vehicle nodes in the city, wherein the contact probability is expressed by the following formula:

wherein, PA,B(T) represents the probability of contact of node A and node B within time T, θA,BMeans representing the negative exponential distribution of the contact intervals of nodes a and B;

step 3.2: the distribution mean of the contact interval exponential distribution is estimated using the statistical average of the node contact intervals, and then the probability of contact of nodes a and B over time T is represented by:

wherein the content of the first and second substances,n denotes the number of contacts of nodes A and B, t1At the moment of first contact, t2i+1At the contact start time of the (i + 1) th time2iThe moment of disconnection of the ith contact;

step 3.3: after introducing the contact freshness degree coefficient, the contact probability of the nodes A and B is expressed by the following formula:

preferably, the step 4 specifically includes:

determining a forwarding utility learning updating model according to the routing requirement of the vehicle-mounted opportunity network and the opportunity contact characteristics of the nodes, wherein the model comprises a dynamic discount factor, a function is reported immediately, and the node contact probability is introduced into the updating model;

immediate return value Rd(s, x) is represented by the following formula:

whereinThe time length of a data packet with a destination node d from an entering node s to a forwarding entering node x is represented;

dynamic discount factor gammad(s, x) is represented by the following formula:

where γ is the constant of the discount factor, 0<γ≤1;The time length of a data packet with a destination node d from the entering node s to the forwarding entering node x is represented;

the forwarding utility Q value update formula is shown by:

wherein Q isd(s, x) selecting a node x as a state-action value of a next skip sending node in a node s for a data packet with a destination node d, namely forwarding a forwarding utility Q value corresponding to the data packet with the destination node d from s to x; alpha is a learning coefficient, and alpha is more than or equal to 0 and less than or equal to 1; rd(s, x) selecting the node x as an immediate return value of a next hop forwarding node in the node s for the data packet with the destination node d; gamma rayd(s, x) is the forwarding of the destination program in node s to node xThe dynamic discount factor corresponding to the data packet with the point d; n is a radical ofxA set of contact nodes representing nodes, the set containing all nodes encountered during the movement of all nodes x; qd' (x, y) is a state-action value that accommodates the node contact probability introduced for the vehicle opportunity network dynamic variation characteristic.

Preferably, the step 5 specifically comprises:

in the vehicle-mounted opportunity network, after the data packet is successfully forwarded, the node receiving the data packet sends receiving confirmation information to the node sending the data packet; when a node sending data receives a data sending confirmation, extracting an ID of a receiving node, an ID of a target node corresponding to a data packet, a Time of receiving the data packet and a state-action value of the maximum contact probability of the corresponding data packet in the receiving node; and calculating the time length of the data packet between two nodes by acquiring the time when the sending node receives the data packet from the last node, and substituting the time length into the updating formula to calculate the state-action value.

Preferably, the step 6 specifically includes: on one hand, the contact information among the nodes is updated through the sending of the node contact interaction information, and the updating comprises the updating of contact time, the updating of contact times, the updating of accumulated contact interval duration and the updating of a contact freshness coefficient among the nodes, so that the calculation of the contact probability among the nodes is realized;

the node contact updating of the state-action value is realized by acquiring Q value list information contained in the contact interaction information, the node data packet transmission duration used by an immediate return value function and a discount factor function is replaced by an average value of the data packet transmission duration between nodes in the node contact updating process, which is different from a forwarding updating process, and a learning coefficient different from the forwarding updating process is adopted in the updating process.

The invention has the following beneficial effects:

according to the vehicle-mounted opportunity network forwarding utility learning model based on the double updating strategies, the contact freshness coefficient and the contact probability between the nodes are calculated by utilizing the contact information between the nodes, the learning of the node forwarding capability is carried out by combining a distributed Q learning framework on the basis of the node contact probability prediction, and the node forwarding utility value learning process is accelerated by utilizing the data packet forwarding updating and the node contact updating double updating strategies, so that the nodes can gradually acquire the forwarding utility of the node to the data packet along with the learning process. The forwarding utility learning model is helpful for selecting data packet forwarding nodes, and improves the transmission performance of the vehicle-mounted opportunity network, including improving the delivery success rate of the data packet and reducing the transmission delay of the data packet.

Drawings

FIG. 1 is a schematic diagram of a vehicle opportunity network architecture;

FIG. 2 is a schematic diagram of a network packet transmission process in a vehicle opportunity;

FIG. 3 is a block diagram of an overall framework of a forward utility learning model design process;

FIG. 4 is a schematic diagram of node contact information of a node s with other nodes;

FIG. 5 is a diagram of a state-to-action value mapping stored in node s;

fig. 6 is a schematic diagram of a contact sequence diagram of nodes a and B during network operation;

FIG. 7 is a diagram of a utility learning model in validating information;

FIG. 8 is a schematic diagram of the state-action value update process after the vehicle nodes A and B forward the data packets;

FIG. 9 is a schematic view of the interactive information content when the nodes are touched;

FIG. 10 is a schematic diagram of the information interaction process of the vehicle nodes A and B.

Detailed Description

The present invention will be described in detail with reference to specific examples.

The first embodiment is as follows:

as shown in fig. 3 to 10, the present invention provides a method for learning forwarding utility of a node in a car networking based on a dual update policy, and the method for learning forwarding utility of a node in a car networking based on a dual update policy includes the following steps:

step 1: determining basic elements in a learning process based on information updating in an information interaction process between vehicle nodes;

the step 1 specifically comprises the following steps:

determining basic elements required in a learning process, the elements comprising: environment, agent, state space, action space, and immediate reward; defining a node update information table, wherein the node update information table comprises a node contact information table and a node state-action value table;

the environment is an environment which takes the vehicle opportunity network as a learning model in the process of delivering the data packet from the source node to the destination node and provides required information along with the forwarding of the data packet in the whole vehicle opportunity network in the city;

the intelligent agent is an intelligent agent which takes a data packet transmitted from a source node to a destination node as a learning algorithm;

the state space is a storage node of a data packet which is formed by all vehicle nodes in the network, and the set of all nodes in the network is the state space of the intelligent agent;

the action space is an action space of an intelligent agent formed by forwarding a data packet to a next hop node by a node, in the vehicle-mounted opportunity network, the node has a storage-carrying-forwarding function, and the selection range of the forwarding node comprises all contacted nodes;

the immediate report means that after the data packet is successfully forwarded to the next hop node, the intelligent agent obtains an immediate report value from the environment for updating the state-action value;

the node contact information table is used for updating contact information when information interaction is carried out on contact between nodes, and the average contact interval and the contact freshness coefficient between the node s and other nodes are calculated through the node contact information, so that the contact probability between the nodes is estimated;

the node state-action value table is an accumulated return value which can be obtained by taking the corresponding node as a next-hop delivery node, and when the return value is larger, the node is selected as the next-hop delivery node, so that the transmission performance of the data packet is better.

Step 2: determining a node contact freshness coefficient based on information updating in an information interaction process between vehicle nodes;

the step 2 specifically comprises the following steps:

coefficient of contact freshness FA,BSetting a contact freshness coefficient F for the freshness degree of contact information between the nodes A and B and representing the timeliness strength of the current contact probability, and when the nodes A and B are not in contactA,BIs equal to zero; after nodes a and B establish a link, the formula is updated by:

FA,B=FA,B+(1-FA,B)*Pint

wherein, PintIs a fixed constant, set PintEqual to 0.85;

when the nodes A and B are not in contact for a long time, the freshness of the contact information between the two nodes is reduced, the freshness coefficient of the contact of the nodes which are not in contact for a long time needs to be attenuated, and the attenuation is updated according to the following formula:

wherein eta is an attenuation factor and takes a value of 0.95; mu.sA,BThe number of time units elapsed from the last contact disconnection time of the nodes a and B is shown, and the length of the time unit is the average contact interval time of the nodes a and B.

And step 3: determining the node contact probability, and establishing a node forwarding first-use learning model in the vehicle-mounted opportunity network;

the step 3 specifically comprises the following steps:

step 3.1: determining node contact probability, and estimating the contact probability between nodes by approximately following negative exponential distribution to the contact interval time between vehicle nodes in the city, wherein the contact probability is expressed by the following formula:

wherein, PA,B(T) represents the probability of contact of node A and node B within time T, θA,BMeans representing the negative exponential distribution of the contact intervals of nodes a and B;

step 3.2: the distribution mean of the contact interval exponential distribution is estimated using the statistical average of the node contact intervals, and then the probability of contact of nodes a and B over time T is represented by:

wherein the content of the first and second substances,n denotes the number of contacts of nodes A and B, t1At the moment of first contact, t2i+1At the contact start time of the (i + 1) th time2iThe moment of disconnection of the ith contact;

step 3.3: after introducing the contact freshness degree coefficient, the contact probability of the nodes A and B is expressed by the following formula:

and 4, step 4: determining a forwarding utility learning and updating model according to the routing requirement of the vehicle-mounted opportunity network and the opportunity contact characteristics of the nodes, wherein the model comprises a dynamic discount factor;

the step 4 specifically comprises the following steps:

determining a forwarding utility learning updating model according to the routing requirement of the vehicle-mounted opportunity network and the opportunity contact characteristics of the nodes, wherein the model comprises a dynamic discount factor, a function is reported immediately, and the node contact probability is introduced into the updating model;

immediate return value Rd(s, x) is represented by the following formula:

whereinData packet with destination node d from entering node s to forwarding entering node xThe length of time elapsed;

dynamic discount factor gammad(s, x) is represented by the following formula:

where γ is the constant of the discount factor, 0<γ≤1;The time length of a data packet with a destination node d from the entering node s to the forwarding entering node x is represented;

the forwarding utility Q value update formula is shown by:

wherein Q isd(s, x) selecting a node x as a state-action value of a next skip sending node in a node s for a data packet with a destination node d, namely forwarding a forwarding utility Q value corresponding to the data packet with the destination node d from s to x; alpha is a learning coefficient, and alpha is more than or equal to 0 and less than or equal to 1; rd(s, x) selecting the node x as an immediate return value of a next hop forwarding node in the node s for the data packet with the destination node d; gamma rayd(s, x) is a dynamic discount factor corresponding to the data packet of which the destination node is d is forwarded to the node x in the node s; n is a radical ofxA set of contact nodes representing nodes, the set containing all nodes encountered during the movement of all nodes x; qd' (x, y) is a state-action value that accommodates the node contact probability introduced for the vehicle opportunity network dynamic variation characteristic.

And 5: establishing a forwarding updating strategy of a forwarding utility value data packet, acquiring the moment when a sending node receives the data packet from the last node, determining the time length between the two nodes, and bringing the time length into an updating model to determine a state-action value;

the step 5 specifically comprises the following steps:

in the vehicle-mounted opportunity network, after the data packet is successfully forwarded, the node receiving the data packet sends receiving confirmation information to the node sending the data packet; when a node sending data receives a data sending confirmation, extracting an ID of a receiving node, an ID of a target node corresponding to a data packet, a Time of receiving the data packet and a state-action value of the maximum contact probability of the corresponding data packet in the receiving node; and calculating the time length of the data packet between two nodes by acquiring the time when the sending node receives the data packet from the last node, and substituting the time length into the updating formula to calculate the state-action value.

Step 6: and establishing a forwarding utility value node contact updating strategy, and adopting a learning coefficient different from that in the forwarding updating process in the updating process.

The step 6 specifically comprises the following steps: on one hand, the contact information among the nodes is updated through the sending of the node contact interaction information, and the updating comprises the updating of contact time, the updating of contact times, the updating of accumulated contact interval duration and the updating of a contact freshness coefficient among the nodes, so that the calculation of the contact probability among the nodes is realized;

the node contact updating of the state-action value is realized by acquiring Q value list information contained in the contact interaction information, the node data packet transmission duration used by an immediate return value function and a discount factor function is replaced by an average value of the data packet transmission duration between nodes in the node contact updating process, which is different from a forwarding updating process, and a learning coefficient different from the forwarding updating process is adopted in the updating process.

The second embodiment is as follows:

FIG. 3 is a framework of the forward utility learning model design process as a whole in accordance with the present invention. The node opportunistic contact is a precondition for forwarding the vehicle opportunistic network data packet and is also a necessary condition for updating the forwarding utility of the node. The node contact can enable the node-to-node contact freshness coefficient and the contact probability to be updated, and the contact freshness coefficient can be used for dynamically adjusting the freshness of the node contact probability; the key components of the forwarding utility learning model updating formula comprise node contact probability, an immediate return function and a dynamic discount factor; the learning process of the forwarding utility mainly comprises the steps of utilizing a Q learning strategy to realize learning of a data packet in the transmission process between vehicle-mounted opportunity network nodes and learning in the node contact process, utilizing a forwarding utility learning model updating formula to update the Q value of the forwarding utility of the data packet, and being used for the forwarding process of the data packet.

The implementation process of the node forwarding utility learning model building stage in the vehicle-mounted opportunity network is as follows:

the method comprises the following steps: determining basic elements required in the learning process, including environment, agent, state space, action space and immediate return, and defining a node update information table, including a node contact information table and a node state-action value table.

Environment: in the delivery process of the data packet from the source node to the destination node, the whole vehicle-mounted opportunity network can provide required information for the data packet along with the forwarding of the data packet, so that the whole vehicle-mounted opportunity network is regarded as the environment of a learning model.

The intelligent agent: the data packet transmitted from the source node to the destination node is used as an agent of the learning algorithm.

State space: all nodes in the network can be used as storage nodes of the data packets, so that the collection of all nodes in the network is the state space of the intelligent agent.

An action space: the nodes forward the data packets to the next hop nodes to form an action space of the intelligent agent, and in the vehicle opportunity network, the nodes have a storage-carrying-forwarding function, so that the selection range of the forwarding nodes comprises all contacted nodes.

Reporting immediately: after the data packet is successfully forwarded to the next hop node, the intelligent agent obtains the immediate return value from the environment for updating the state-action value.

Node contact information table: the contact information is updated when the contact between the nodes carries out information interaction, and the average contact interval and the contact freshness coefficient between the node s and other nodes can be calculated through the node contact information, so that the contact probability between the nodes can be estimated. Figure 4 shows a graph of node contact information for node s with other nodes,

node state-action value table: fig. 5 shows a state-action value mapping stored in node s. Wherein the node s is the state of the data packet, the dark vertical row represents the destination node of the data packet, the dark horizontal row represents the node contacted by the node s in the moving process,for one of the data tuples,representing the number of times node s forwards a packet destined for node D to node a,representing the average time, Q, that the node s has elapsed to transmit a data packet of the destination node D to the node AD(s, A) represents the accumulated return value which can be obtained by selecting the node A as the next-hop delivery node from the nodes s for the data packet with the destination node D, and the larger the value is, the better the delivery performance of the data packet is for selecting the node A as the next-hop delivery node.

Step two: a node contact freshness factor is defined.

Coefficient of contact freshness FA,B: the freshness of the contact information between the nodes A and B can represent the timeliness strength of the current contact probability. When the nodes A and B are not in contact, the contact freshness degree coefficient F is setA,BIs equal to zero; when nodes a and B establish a link, the update formula of the coefficient is as follows. Wherein, PintIs a fixed constant, set PintEqual to 0.85.

FA,B=FA,B+(1-FA,B)*Pint (1)

When the nodes A and B are not in contact for a long time, the freshness of the contact information between the two nodes should be reduced, so that the freshness coefficient of the contact of the nodes which are not in contact for a long time needs to be attenuated, and the attenuation update formula is shown as follows. Wherein eta is an attenuation factor and takes a value of 0.95; mu.sA,BNumber of units of time, unit of time, elapsed from the time of last contact-breaking of nodes A and BIs the average contact interval time of nodes a and B.

Step three: determining node contact probability

Fig. 6 shows a contact sequence diagram of nodes a and B during network operation, with one contact cycle comprising three time nodes: contact termination time t2(i-1)Contact start time t2i-1And contact termination time t2i. The gray area indicates that two nodes are in a link-off state, the white area indicates that two nodes are in a link-on state, 0 indicates the start time of network operation, t is the first contact period of A and B1At the contact start time, t2As contact termination time, t3Is the contact start time in the second contact period. T (A, B) ═ T2The length of time for the first contact period for nodes a and B. The shorter the period of contact between two nodes, the more frequently the two nodes are in contact. D (A, B) ═ t2-t1The node contact duration in the first contact period for nodes a and B. The longer the two nodes are in contact, the more stable the link between the two nodes is and the more data traffic can be transmitted. T (A, B) -D (A, B) ═ T1The contact interval duration for the first contact period for nodes a and B. The larger the contact interval between the two nodes is, the smaller the probability that the two nodes are in contact with each other is, and the smaller the possibility of transmitting a data packet is.

The contact interval time between the nodes approximately follows a negative exponential distribution, so as to estimate the contact probability between the nodes, and the contact probability is shown as the following formula. Wherein, PA,B(T) represents the probability of contact of node A and node B within time T, θA,BRepresents the mean of the negative exponential distribution of the contact intervals of nodes a and B.

The distribution mean of the contact interval exponential distribution is estimated by using the statistical average of the node contact intervals, and the contact probability of the nodes A and B in the time T is shown as the following formula.

Wherein the content of the first and second substances,n denotes the number of contacts of nodes A and B, t1At the moment of first contact, t2i+1At the contact start time of the (i + 1) th time2iThe moment of opening of the ith contact.

After introducing the contact freshness degree coefficient, the contact probability formula of the nodes A and B is shown as the following formula.

Step four: according to the routing requirement and the node opportunity contact characteristics of the vehicle-mounted opportunity network, a forwarding utility learning updating formula is defined, wherein the forwarding utility learning updating formula comprises a dynamic discount factor and an immediate return function, and the node contact probability is introduced into the updating formula.

Immediate return value Rd(s, x) is defined as shown in the following formula. WhereinIndicating the length of time that a packet destined for node d has elapsed from ingress node s to the time it is forwarded to ingress node x.

Dynamic discount factor gammadThe definition of (s, x) is shown below. Where γ is the constant of the discount factor, 0<γ≤1;Data packet with d as destination node from entering nodes to the length of time it takes to forward into node x.

The forwarding utility Q value update formula is defined as follows. Wherein Q isd(s, x) selecting a node x as a state-action value of a next hop forwarding node in a node s for a data packet with a destination node d, namely forwarding a forwarding utility Q value corresponding to the data packet with the destination node d from s to x; alpha is a learning coefficient, and alpha is more than or equal to 0 and less than or equal to 1; rd(s, x) selecting the node x as an immediate return value of a next hop forwarding node in the node s for the data packet with the destination node d; gamma rayd(s, x) is a dynamic discount factor corresponding to the data packet of which the destination node is d is forwarded to the node x in the node s; n is a radical ofxA set of contact nodes representing nodes, the set containing all nodes encountered during the movement of all nodes x; qd' (x, y) is a state-action value that accommodates the node contact probability introduced for the onboard opportunity network dynamics.

(2) Dual update strategy for packet forwarding update and node contact update

The method comprises the following steps: and forwarding the utility value data packet and updating the strategy.

In the vehicle opportunity network, after the data packet is successfully forwarded, the node receiving the data packet sends a receiving confirmation message to the node sending the data packet, and fig. 7 shows the content of the utility learning model in the confirmation message.

Fig. 8 shows the status-action value updating process after nodes a and B forward the data packet, where a is the data sending node, B is the data receiving node, and the destination node of the data packet is D.

When the node B receives the data packet forwarded from the node A, the time of receiving the data packet is recorded and whether the destination node of the data packet is the node B is judged. If B is dataAnd the destination node of the packet, namely D ═ B, the data packet forwarding process enters a termination state. In the data packet receiving confirmation information, only the ID of the receiving node, the ID of the data packet and the Time for receiving the data packet are needed, the value of field Q is set to be null, and after the node A receives the confirmation information sent by the node B, the immediate return value R is calculated according to the formula (6)D(A, D), and updating the corresponding Q value Q in the state-action value list according to the formula (9)D(A,D)。

If the node B is not the destination node of the data packet, the node B needs the ID of the receiving node, the ID of the data packet, the Time of receiving the data packet and the Q value of the maximum contact introduction probability of the data packet in the receiving node in the data packet receiving confirmation informationQD' (B, y) calculation formula As shown in (10), the dynamic discount factor γ is calculated according to the formula (7)D(A, B), and updating the corresponding Q value Q in the state-action value list according to the formula (11)D(A, B). Meanwhile, after receiving the reception confirmation, the node a adds 1 to the number of data packet records that the destination node D needs to be forwarded to the node B, and adds the time length of the data packet between the two nodes a and B to the accumulated transmission time length of the destination node D to calculate the average transmission interval time length.

Step two: the forwarding utility value node contacts the update policy.

FIG. 9 shows interaction information content of node contact designed in a utility learning model, which enables contact information between nodes to be updated by sending contact interaction information, including updating of contact time, updating of contact times, updating of accumulated contact interval duration and updating of contact freshness coefficient between nodes, thereby realizing calculation of contact probability between nodes; on the other hand, by acquiring the Q value list information contained in the contact interaction information, node contact update of the state-action value can be realized.

FIG. 10 shows the process of information interaction between nodes A and B. The nodes A and B enter the mutual communication range and establish communication connection, and the two nodes respectively send interaction information to each other. After A receives the interaction information of B, firstly updating a contact information table of A and B, including the latest contact time of A and B, the contact times of A and B and the contact interval accumulation time of node B, and updating the contact freshness coefficients F of nodes A and BA,B(ii) a Then checking whether A sends data packet whose destination node is B to B, if so, calculating average value of time length of such data packet from A to B WhereinFor the number of times of sending such data packet, the mean value is substituted for the transmission time of the data packet into the corresponding Q value Q in the update state-action value list of formula (9)B(A, B), if not sent, not update; finally, the Q value list in the interactive information sent by the node B is compared with the state-action value list of the node A for updating one by one, for example, (d) is comparednmaxQ), if the state-action value table of node A contains the corresponding destination node dnThe Q value of the forwarding node B is calculated to be dnIs averaged over the length of time that the data packet has elapsed from node a to node B WhereinFor the destination node is dnThe average value and the corresponding maxQ are substituted into the corresponding Q value in the updated state-action value list of equation (11) for the number of times the packet of (2) is sent from node a to node B, and if node a does not contain a Q value, the update is not performed.

The above is only a preferred embodiment of the car networking node forwarding utility learning method based on the dual update strategy, and the protection scope of the car networking node forwarding utility learning method based on the dual update strategy is not limited to the above embodiments, and all technical solutions belonging to the idea belong to the protection scope of the present invention. It should be noted that modifications and variations which do not depart from the gist of the invention will be those skilled in the art to which the invention pertains and which are intended to be within the scope of the invention.

17页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种发送报文的方法和装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!