Distributed type star group dynamic networking method based on deep reinforcement learning

文档序号：490375 发布日期：2022-01-04 浏览：2次中文

阅读说明：本技术 基于深度强化学习的分布式星群动态组网方法 (Distributed type star group dynamic networking method based on deep reinforcement learning ) 是由何元智盛彪于 2021-11-03 设计创作，主要内容包括：本发明公开了一种基于深度强化学习的分布式星群动态组网方法,其步骤包括：获取分布式星群各卫星实时轨道信息；根据分布式星群任务需求,建立多目标优化模型；构建双层深度强化学习架构；设计双层深度强化学习算法,使用该算法进行分布式星群激光组网优化；各卫星根据优化结果调整激光通信链路,完成网络构建或组网重构。本发明实现了分布式星群网络互连通、拓扑持续时间和网络连接矩阵摄动的综合优化组网,具有网络拓扑稳定和组网速度快的优点；通过构建多目标优化模型进行分布式星群拓扑优化,能够实现组网结果的综合最优；采用深度强化学习组网算法可以实现快速组网。(The invention discloses a distributed type star group dynamic networking method based on deep reinforcement learning, which comprises the following steps: acquiring real-time orbit information of each satellite of a distributed constellation; establishing a multi-objective optimization model according to the distributed star group task requirements; constructing a double-layer deep reinforcement learning framework; designing a double-layer deep reinforcement learning algorithm, and optimizing distributed constellation laser networking by using the algorithm; and each satellite adjusts the laser communication link according to the optimization result to complete network construction or networking reconstruction. The invention realizes the comprehensive optimization networking of the interconnection of the distributed constellation networks, the topology duration and the perturbation of the network connection matrix, and has the advantages of stable network topology and high networking speed; the distributed constellation topology optimization is carried out by constructing a multi-objective optimization model, so that the comprehensive optimization of the networking result can be realized; and the rapid networking can be realized by adopting a deep reinforcement learning networking algorithm.)

1. A distributed constellation dynamic networking method based on deep reinforcement learning is characterized in that a distributed constellation system comprises a plurality of GEO orbit satellites, each satellite realizes multi-satellite interconnection through an optical multi-beam antenna, the distributed constellation system adopts a double-layer deep reinforcement learning algorithm to carry out dynamic networking optimization, and the method comprises the following specific steps:

s1, setting the number of satellites in the distributed constellation system to be S, the number of optical multi-beam antennas of each satellite to be A, and each optical multi-beam antenna can simultaneously support N laser communication links;

s2, acquiring real-time orbit information of each satellite of the distributed constellation system by means of receiving ground telemetering data, inter-satellite ranging and state detection;

s3, calculating the available state matrix of the distributed star group laser communication link according to the acquired real-time orbit information of each satellite of the distributed star group system,

s4, obtaining laser communication links among satellites of the distributed constellation systemThe current network topology structure of the formed topological network is expressed as a matrix T_curThe matrix T_curThe element in (A) is expressed as T_ik,jl1,2,., S, j ═ 1,2,.., S, k ═ 1,2,. a, l ═ 1,2,. a, where T is 1,2_ik,jlUsed for representing the connection state between the kth antenna of the ith satellite and the l antenna of the jth satellite, if the kth antenna and the jth antenna are connected by a laser communication link, then T_ik,jlIs 1, otherwise T_ik,jlIs 0;

s5, comparing the matrix T one by one_curAnd matrix L_inkOf (1), if T is present_ik,jl1 and α_ik,jlIf the state is 0, judging that the change of the available state matrix influences the network topology structure of the distributed constellation system, and turning to step S6, otherwise, maintaining the current network topology structure of the distributed constellation system, and turning to step S2;

s6, establishing a multi-objective optimization model according to the networking requirements of the laser communication link of the distributed constellation system;

s7, the distributed constellation system uses a double-layer depth reinforcement learning algorithm to solve the multi-target optimization model obtained in the step S6 to obtain a networking reconstruction matrix,

and S8, after the training of the inner layer agent and the outer layer agent is finished, when the distributed star group system needs to be networked again, calling the trained double-layer deep reinforcement learning algorithm to obtain a networking reconstruction matrix, and using the networking reconstruction matrix to carry out networking again on the distributed star group system to finish a networking optimization process.

2. The distributed dynamic constellation networking method based on deep reinforcement learning of claim 1,

the step S3 includes the following steps:

s31, assuming that the distributed constellation system adopts the homodyne BPSK modulation method to carry out laser communication between planets, the communication bit error rate BER of the distributed constellation system_BPSKThe expression of (a) is:

wherein R is the responsivity of the photodetector, d_TAnd d_RThe apertures of the transmitting antenna and the receiving antenna of the optical multi-beam antenna respectively,for divergence angle of laser beam, S_tFor transmitting signal power, L_ATPFor acquisition, tracking and Alignment (ATP) mismatch loss of laser communication link, P_LOFor local oscillator laser power, σ is the noise power, erfc () is the complementary error function, D_linkFor the link distance between two satellites for establishing a laser communication link, the variable of the link distance is D for the ith satellite and the jth satellite in the distributed constellation system_ijOr D_jiThe calculation formula is as follows:

wherein (x)_i,y_i,z_i) (v) coordinates of the ith satellite in the Earth's inertial frame_x,i,v_y,i,v_z,i) The three-dimensional vector of the motion speed of the ith satellite in the earth inertia system is represented by theta, the beam deflection angle is represented by eta (theta), and the transmission efficiency of the optical multi-beam antenna when the beam deflection angle is represented by theta is represented by the following expression:

wherein, theta_maxA maximum beam deflection angle supported by the optical multi-beam antenna;

s32, calculating the visual state among the antennae of each satellite of the distributed constellation system; using theta and D_linkCalculating to obtain the bit error rate of each laser communication link, and recording the upper limit of the bit error rate as BER_thWhen the link error rate is larger than the upper limit, the laser communication is judgedThe link is interrupted, namely the antennas of the two satellites corresponding to the link are in an invisible state;

s33, calculating an available state matrix L of the laser communication link of the distributed constellation system_inkThe available state matrix L_inkThe element in (A) is expressed as alpha_ik,jl1,2, S, j 1,2, S, k 1,2, a, l 1,2, a, S represents the number of satellites included in the distributed constellation system, a represents the number of optical multi-beam antennas included in one satellite, and α represents the number of optical multi-beam antennas included in one satellite_ik,jlThe antenna is used for representing the visual state between the kth antenna of the ith satellite and the lth antenna of the jth satellite, when the kth antenna of the ith satellite and the jth antenna of the jth satellite are in the visual state, the value is 1, otherwise, the value is 0, and when i is equal to j, alpha is_ik,jlAnd is noted as 0.

3. The distributed dynamic constellation networking method based on deep reinforcement learning of claim 1,

the step S6 includes the specific steps of,

s61, using available state matrix L of laser communication link of distributed star group system_inkCalculating to obtain a networking reconstruction state matrix A_ntThe element in the matrix is beta_ik,jl1,2, S, j, 1,2, a, l, 1,2, a, wherein β_ik,jlWhen the number is 1, a laser communication link is established between the kth antenna of the ith satellite and the l antenna of the jth satellite, and β is_ik,jlWhen the value is equal to 0, the laser communication link is not established between the kth antenna of the ith satellite and the lth antenna of the jth satellite, and when the value is equal to j, the value is beta_ik,jlIs marked as 0;

s62, calculating a connection matrix T between the satellites of the distributed constellation system_pThe expression is as follows:

wherein, γ_i,jIndicating whether a laser communication link exists between the ith satellite and the jth satelliteWhen k and l are 1 to A respectively, if all corresponding beta_ik,jlIn which any one is not 1, then gamma_i,j1, i.e. there is a laser communication link between satellite i and satellite j, if all β's correspond to_ik,jlIf the values are all 1, then γ_i,j＝0；

S63, calculating a laser communication link weight matrix W, wherein the expression of the elements in the matrix is as follows:

wherein, i is 1,2, 1, S, j is 1,2, 1_i,jFor the laser communication link distance, θ, between satellite i and satellite j_i,kAnd theta_j,lBeam deflection angles, η, of the transmitting and receiving antennas of satellite i and satellite j, respectively_t(θ_i,k) To deflect the beam at a transmitting and receiving antenna by an angle theta_i,kLower corresponding transmitting antenna transmissivity, η_r(θ_j,l) To deflect the beam at a transmitting and receiving antenna by an angle theta_j,lLower corresponding receive antenna transmission;

s64, calculating a Laplace matrix L of a topological network formed by laser communication links among all satellites of the distributed constellation system_pElement l thereof_pi,jIs represented by the formula:

wherein, i is 1,2, and S, j is 1, 2;

s65, calculating algebraic weighted connectivity of the topological network of the distributed constellation system, wherein the algebraic weighted connectivity takes the value of a Laplacian matrix L_pSecond small eigenvalue λ of₂The algebraic weighted connectivity calculation is denoted as acon (L)_p)；

S66, calculating the duration min (t) of the topological network of the distributed star group system_Tp) The duration of the topological network means that the topological network maintains the current network topology without sending outTime of occurrence of change, t_TpSet of durations, t, of laser communication links for a distributed constellation system_Tp＝{t_i,j}，i＝1,2,...,S,j＝1,2,...,S，t_i,jIs the duration of the laser communication link between satellite i and satellite j; when gamma is_i,jWhen 0, there is no laser communication link between the satellite i and the satellite j, let t_i,jWhen γ is Inf_i,jWhen 1, t_i,jThe time interval between the moment when the laser communication link between the satellite i and the satellite j exceeds the visible range and the moment when the real-time orbit information of each satellite of the distributed constellation system is obtained is equal to;

s67, calculating the perturbation D of the network connection matrix, wherein the expression is Representing that the corresponding elements of the two matrixes are subjected to XOR operation in sequence, and sum represents that all elements of the matrixes obtained by XOR operation are accumulated;

s68, establishing a multi-objective optimization model:

wherein, g₁(L_p)、g₂(t_Tp) And g₃(A_nt,T_cur) Respectively representing three optimization objective functions of network interconnection, network duration and network connection matrix perturbation, wherein the constraint condition C1 is the visibility constraint of laser communication links between satellites, namely the available state matrix L corresponding to each laser communication link_inkThe value of the element in (1) must be 1; constraint C2 indicates that the topological network must be connected; constraint C3 indicates that the number of laser communication links established simultaneously by all antennas must be less than the beam number limit, where,represents a to A_ntIs summed by rows to obtain a column vector, the ith element in the column vector represents the number of laser communication links currently established by the ith antenna.

4. The distributed dynamic constellation networking method based on deep reinforcement learning of claim 1,

the step S7 includes the following steps:

s701, constructing an implementation framework of a double-layer deep reinforcement learning algorithm, wherein the implementation framework comprises an inner-layer environment, an outer-layer environment, an inner-layer experience pool, an outer-layer experience pool, an inner-layer agent and an outer-layer agent, the outer-layer environment is used for simulating a topological structure state of a topological network of a distributed constellation system, the outer-layer agent is used for extracting information from the outer-layer environment to obtain an outer-layer state, the inner-layer environment is used for simulating an interconnection state of the topological network of the distributed constellation system, the inner-layer agent is used for extracting information from the inner-layer environment to obtain an inner-layer state, description variables of the outer-layer environment comprise an available state matrix of a laser communication link of the distributed constellation system, actions of the outer-layer agent are used for selecting an objective function optimization task for the inner-layer agent, and parameters of the objective function optimization task comprise algebraic weighting connectivity of the topological network of the distributed constellation system, The method comprises the steps that a duration and a connection matrix perturbation value combination are combined, a description variable of an outer layer environment is obtained through a topological structure after each networking of a topological network of a distributed constellation system, a description variable of an inner layer environment is a networking reconstruction matrix, an action of an inner layer intelligent agent is to establish a laser communication link between two satellites of the distributed constellation system, and an inner layer state variable is a networking reconstruction matrix obtained in the middle process of solving a multi-target optimization model by using a double-layer deep reinforcement learning algorithm; the inner layer experience pool and the outer layer experience pool are used for storing inner layer experience and outer layer experience respectively;

s702, initializing parameters of a double-layer depth reinforcement learning algorithm, wherein the parameters of the double-layer depth reinforcement learning algorithm comprise the size of an experience pool of an inner layer and an outer layer, an experience quantity learning threshold of the experience pool of the inner layer, an experience quantity learning threshold of the experience pool of the outer layer, exploration probability, discount factors, network parameters of inner-layer and outer-layer intelligent agents, target network updating frequency and reward value functions; setting the upper limit of the algorithm training round as ME, and setting the current round number loop as 0;

s703, obtaining initial parameters of the distributed constellation system, including the number of satellites in the distributed constellation system, the number of optical multi-beam antennas of each satellite, the number of laser communication links established by each optical multi-beam antenna and real-time orbit information of each satellite in the distributed constellation system, and calculating an available state matrix L of the laser communication links_ink(ii) a Initializing an outer-layer state variable of a double-layer deep reinforcement learning algorithm into an available state matrix L_inkSetting the termination state of the outer layer state variable by using zero matrixes with the same dimension; judging whether the loop is smaller than the ME, if the loop is smaller than the ME, turning to a step S704, otherwise, finishing the training of the inner-layer agent and the outer-layer agent, and turning to a step S8;

s704, determining whether the outer layer state variable is in the end state, if so, the loop is loop +1, and going to step S703, otherwise, going to step S705;

s705, the outer-layer agent selects an objective function optimization task for the inner-layer agent according to the outer-layer state of the double-layer depth reinforcement learning algorithm;

s706, the inner layer agent optimizes the task according to the selected objective function and initializes the inner layer state;

s707, the inner layer agent selects whether to take action according to the inner layer state and the selected objective function optimization task, namely whether to establish a certain laser communication link between certain two satellites of the distributed constellation system;

s708, the inner agent calculates an inner reward Botr, and the calculation formula is as follows:

the inner layer intelligent agent updates the inner layer state and stores the inner layer experience into an inner layer experience pool, wherein the inner layer experience comprises the inner layer state, the inner layer action, the inner layer reward and the updated inner layer state;

s709, usable state matrix L of laser communication link_inkTaking the number of the elements of 1 as the number of the available laser communication links, judging whether the number of the available laser communication links is more than 0, if so, turning to a step S707, otherwise, turning to a step S710;

s710, the outer-layer agent obtains a final inner-layer state, namely a final networking reconstruction matrix, as a networking result of the distributed constellation system;

s711, the outer agent calculates an outer reward Topr, which is w₁f₁(g₁)+w₂f₂(g₂)+w₃f₃(g₃) Wherein w is_iIs the weight of the ith objective function, f_iIs a normalization function of the ith objective function, g_iThe method comprises the steps that an ith objective function is adopted, i is 1,2 and 3, an outer layer state variable is updated to be a networking result of the time, outer layer experience is stored in an outer layer experience pool, and the outer layer experience comprises an outer layer state, an outer layer intelligent body action, an outer layer reward and the updated outer layer state;

s712, judging whether the data volume of the inner experience pool is larger than the learning threshold of the experience number of the inner experience pool, judging whether the data volume of the outer experience pool is larger than the learning threshold of the experience number of the outer experience pool, if the data volume of the inner experience pool and the data volume of the outer experience pool are larger than the learning threshold of the experience number of the corresponding experience pool, training the inner agent and the outer agent, and then switching to the step S704, and if not, directly switching to the step S704.

Technical Field

The invention relates to the technical field of satellite communication, in particular to a distributed type constellation dynamic networking method based on deep reinforcement learning.

Background

With the development of human space detection, earth observation, internet of things and broadband communication technologies, the demands of space data processing and transmission by future high-resolution earth observation tasks, space-based cloud storage services, space-based internet services, deep space detection tasks, manned spacecrafts, space stations and other space-based information systems are increasingly urgent. The space distribution type constellation realizes the functions of wide area coverage, large-capacity information exchange, flexible networking communication, space information service, autonomous topological reconstruction, rapid on-orbit self-healing and the like by adopting a plurality of heterogeneous task satellites on the same GEO space orbit position and adopting the distributed load joint cooperation, thereby overcoming the problems of resource constraint, technical bottleneck and the like of the traditional large satellite platform. In order to realize the cooperative work of each satellite in the distributed constellation, the constellation needs to have inter-satellite cooperation and data transmission capabilities. Considering that the amount of data required to be carried by a space-based information network in the future is huge, and the real-time requirement of a user on various information services is high, the traditional microwave communication is difficult to meet the requirement of high-speed communication in a constellation under the condition of considering system complexity, load and power consumption. The space laser communication has the advantages of large capacity, small volume, strong anti-interference capability, good confidentiality and the like. The optical multi-beam antenna lays a technical foundation for networking interaction of high-speed lasers in the distributed constellation group. However, the optical multi-beam antenna is affected by technology and process, and there are constraints such as limited link distance and pointing angle, and in an application scenario where the relative spatial positions of the satellite nodes of the distributed constellation change rapidly, it is difficult for each satellite to be within the visible range of the optical multi-beam antenna for a long time, and the visibility state of the inter-satellite link will also change with the change of the relative inter-satellite positions. Therefore, the distributed constellation must have intelligent networking and reconfiguration capabilities in the on-orbit operation process. How to realize the dynamic networking optimization of the distributed constellation and ensure the connectivity of the network topology of the distributed constellation and the network duration become the urgent problem to be solved.

In chinese patent CN113301591A, an inter-satellite network optimization method for observing satellite constellations in a global networking is proposed, which solves the inter-satellite network optimization problem by a Dijkstra method based on load weighting, and optimizes the average transmission delay of inter-satellite links; in the chinese patent CN110601748B, an improved multi-objective simulated annealing algorithm is proposed to perform a multi-state spatial information network topology generation optimization algorithm, which reduces network delay and improves network survivability; in patent CN108540204B, aiming at high dynamics of a satellite network, a method for generating a satellite network topology using a fast convergence ant colony algorithm is provided with an optimization target of average end-to-end delay and maximum end-to-end delay of a link, and considering the influence of the joint action of the length of an inter-satellite link, the link connection time and the link capacity on the generation of the satellite network topology, an improved ant colony algorithm is used to obtain a global optimal topology, thereby enhancing the stability of the topology. The technology disclosed in the above patent can optimize the satellite networking effect to a certain extent, but is developed for the traditional satellite communication link, aims at optimizing link transmission delay, fails to consider the characteristics of large distributed constellation laser communication capacity, high transmission rate and difficult laser alignment, and is difficult to realize the problem of fast distributed constellation networking.

Disclosure of Invention

The invention discloses a distributed type constellation dynamic networking method based on deep reinforcement learning, aiming at the characteristics of large laser communication capacity, high transmission rate and difficult laser alignment of a distributed type constellation and difficult realization of distributed type constellation fast networking, wherein a distributed type constellation system comprises a plurality of GEO orbit satellites, each satellite realizes multi-satellite interconnection through an optical multi-beam antenna, the distributed type constellation system adopts a double-layer deep reinforcement learning algorithm to carry out dynamic networking optimization, and the method comprises the following specific steps:

and S1, setting the number of satellites in the distributed constellation system to be S, the number of optical multi-beam antennas of each satellite to be A, and each optical multi-beam antenna can simultaneously support N laser communication links.

And S2, obtaining the real-time orbit information of each satellite of the distributed constellation system by means of receiving ground telemetering data, inter-satellite ranging and state detection.

S3, calculating an available state matrix of the distributed constellation laser communication link through the obtained real-time orbit information of each satellite of the distributed constellation system, and the specific steps comprise:

wherein, theta_maxA maximum beam deflection angle supported by the optical multi-beam antenna;

s32, calculating the visual state among the antennae of each satellite of the distributed constellation system; using theta and D_linkCalculating to obtain the bit error rate of each laser communication link, and recording the upper limit of the bit error rate as BER_thWhen the error rate of the link is larger than the upper limit, the laser communication link is judged to be interrupted, namely the antennas of the two satellites corresponding to the link are in an invisible state;

S4, obtaining the current network topology structure of the topology network formed by the laser communication links among the satellites of the distributed constellation system, and representing the topology network as a matrix T_curThe matrix T_curThe element in (A) is expressed as T_ik,jl1,2,., S, j ═ 1,2,.., S, k ═ 1,2,. a, l ═ 1,2,. a, where T is 1,2_ik,jlUsed for representing the connection state between the kth antenna of the ith satellite and the l antenna of the jth satellite, if the kth antenna and the jth antenna are connected by a laser communication link, then T_ik,jlIs 1, otherwise T_ik,jlIs 0;

s5, comparing the matrix T one by one_curAnd matrix L_inkOf (1), if T is present_ik,jl1 and α_ik,jlIf the value is 0, it is determined that the change of the available state matrix will affect the network topology of the distributed constellation system, and the step S6 is performed, otherwise, the current network topology of the distributed constellation system is maintained, and the step S2 is performed.

S6, establishing a multi-objective optimization model according to the networking requirements of the laser communication link of the distributed constellation system; the step S6 includes the specific steps of,

s61, using available state matrix L of laser communication link of distributed star group system_inkCalculating to obtain a networking reconstruction state matrix A_ntThe element in the matrix is beta_ik,jl1,2, S, j, 1,2, a, l, 1,2, a, wherein β_ik,jlWhen 1, it is expressed inA laser communication link, beta, is established between the kth antenna of the i satellite and the l antenna of the j satellite_ik,jlWhen the value is equal to 0, the laser communication link is not established between the kth antenna of the ith satellite and the lth antenna of the jth satellite, and when the value is equal to j, the value is beta_ik,jlIs marked as 0;

s62, calculating a connection matrix T between the satellites of the distributed constellation system_pThe expression is as follows:

wherein, γ_i,jIndicating whether a laser communication link exists between the ith satellite and the jth satellite, and if k and l respectively take 1 to A, if all corresponding beta is in the range_ik,jlIn which any one is not 1, then gamma_i,j1, i.e. there is a laser communication link between satellite i and satellite j, if all β's correspond to_ik,jlIf the values are all 1, then γ_i,j＝0；

S63, calculating a laser communication link weight matrix W, wherein the expression of the elements in the matrix is as follows:

wherein, i is 1,2, and S, j is 1, 2.

S65, calculating algebraic weighted connectivity of the topological network of the distributed constellation system, wherein the algebraic weighted connectivity takes the value of a Laplacian matrix L_pSecond small eigenvalue λ of₂The algebraic weighted connectivity calculation is denoted as acon (L)_p)；

S66, calculating the duration of the topological network of the distributed star cluster systemThe duration of a topological network is the time during which the topological network maintains the current network topology without change, t_TpSet of durations, t, of laser communication links for a distributed constellation system_Tp＝{t_i,j}，i＝1,2,...,S,j＝1,2,...,S，t_i,jIs the duration of the laser communication link between satellite i and satellite j. When gamma is_i,jWhen 0, there is no laser communication link between the satellite i and the satellite j, let t_i,jWhen γ is Inf_i,jWhen 1, t_i,jThe time interval between the moment when the laser communication link between the satellite i and the satellite j exceeds the visible range and the moment when the real-time orbit information of each satellite of the distributed constellation system is obtained is equal to;

s68, establishing a multi-objective optimization model:

wherein, g₁(L_p)、g₂(t_Tp) And g₃(A_nt,T_cur) Respectively representing three optimization objective functions of network interconnection, network duration and network connection matrix perturbation, wherein the constraint condition C1 is the visibility constraint of laser communication links between satellites, namely the available state matrix L corresponding to each laser communication link_inkThe value of the element in (1) must be 1; constraint C2 indicates that the topological network must be connected; constraint C3 indicates that the number of laser communication links established simultaneously by all antennas must be less than the beam number limit, where,represents a to A_ntSumming all elements in the column vector according to rows to obtain a column vector, wherein the ith element in the column vector represents the number of the laser communication links currently established by the ith antenna;

s701, constructing an implementation framework of a double-layer deep reinforcement learning algorithm, wherein the implementation framework comprises an inner-layer environment, an outer-layer environment, an inner-layer experience pool, an outer-layer experience pool, an inner-layer agent and an outer-layer agent, the outer-layer environment is used for simulating a topological structure state of a topological network of a distributed constellation system, the outer-layer agent is used for extracting information from the outer-layer environment to obtain an outer-layer state, the inner-layer environment is used for simulating an interconnection state of the topological network of the distributed constellation system, the inner-layer agent is used for extracting information from the inner-layer environment to obtain an inner-layer state, description variables of the outer-layer environment comprise an available state matrix of a laser communication link of the distributed constellation system, actions of the outer-layer agent are used for selecting an objective function optimization task for the inner-layer agent, and parameters of the objective function optimization task comprise algebraic weighting connectivity of the topological network of the distributed constellation system, The duration and the perturbation value combination of the connection matrix, the description variable of the outer layer environment is obtained through a topological structure after each networking of a topological network of the distributed constellation system, the description variable of the inner layer environment is a networking reconstruction matrix, the action of the inner layer intelligent agent is to establish a laser communication link between two satellites of the distributed constellation system, and the state variable of the inner layer is a networking reconstruction matrix obtained in the middle process of solving the multi-objective optimization model by using the double-layer deep reinforcement learning algorithm. The inner layer experience pool and the outer layer experience pool are used for storing inner layer experience and outer layer experience respectively;

s704, determining whether the outer layer state variable is in the end state, if so, the loop is loop +1, and going to step S703, otherwise, going to step S705;

s705, the outer-layer agent selects an objective function optimization task for the inner-layer agent according to the outer-layer state of the double-layer depth reinforcement learning algorithm;

s706, the inner layer agent optimizes the task according to the selected objective function and initializes the inner layer state;

s708, the inner agent calculates an inner reward Botr, and the calculation formula is as follows:

s710, the outer-layer agent obtains a final inner-layer state, namely a final networking reconstruction matrix, as a networking result of the distributed constellation system;

And S8, after the training of the inner layer agent and the outer layer agent is finished, when the distributed star group system needs to be networked again, calling the trained double-layer deep reinforcement learning algorithm to obtain a networking reconstruction matrix, and using the networking reconstruction matrix to carry out networking again on the distributed star group system to finish a networking optimization process.

The beneficial effects of the invention include:

1. according to the distributed type constellation optimization method, various requirements of a distributed type constellation system are fully considered, a multi-objective optimization model is established with the purposes of maximizing network intercommunication, topological duration and minimizing network connection matrix perturbation, and a distributed type constellation networking optimization result with optimal comprehensive benefits is achieved;

2. the invention adopts a deep reinforcement learning algorithm to carry out networking optimization of the distributed constellation system, the algorithm has less calculation power and high calculation speed, and can quickly respond to the change condition of the available laser link of the distributed constellation system and give out an optimized topological result.

Drawings

Fig. 1 is a flowchart of an implementation of the distributed dynamic constellation networking method based on deep reinforcement learning according to the present invention.

Detailed Description

An embodiment of the present invention is given below, and a detailed description thereof will be given.

Fig. 1 is a flowchart of an implementation of a distributed dynamic constellation networking method based on deep reinforcement learning according to the present invention. As shown in fig. 1, the invention discloses a distributed constellation dynamic networking method based on deep reinforcement learning, wherein a distributed constellation system comprises a plurality of GEO orbit satellites, each satellite realizes multi-satellite interconnection through an optical multi-beam antenna, the distributed constellation system adopts a double-layer deep reinforcement learning algorithm to perform dynamic networking optimization, and the method comprises the following specific steps:

wherein (x)_i,y_i,z_i) (v) coordinates of the ith satellite in the Earth's inertial frame_x,i,v_y,i,v_z,i) Is the movement velocity of the ith satellite in the earth inertiaA three-dimensional vector in the linear system, θ is a beam deflection angle, η (θ) is the transmission efficiency of the optical multi-beam antenna when the beam deflection angle is θ, and the expression is as follows:

wherein, theta_maxA maximum beam deflection angle supported by the optical multi-beam antenna;

s32, calculating the visual state among the antennae of each satellite of the distributed constellation system; using theta and D_linkCalculating to obtain the bit error rate of each laser communication link, and recording the upper limit of the bit error rate as BER_thWhen the error rate of the link is larger than the upper limit, the laser communication link is judged to be interrupted, namely the antennas of the two satellites corresponding to the link are in an invisible state;

for the case where the number of antennas and the number of satellites take other values, the state matrix L may be used_inkCan be constructed in the manner described above。

for the case that the number of antennas and the number of satellites take other values, the networking reconstructs the state matrix A_ntIt can be constructed in the above-described manner.

S62, calculating a connection matrix T between the satellites of the distributed constellation system_pThe expression is as follows:

S63, calculating a laser communication link weight matrix W, wherein the expression of the elements in the matrix is as follows:

wherein, i is 1,2, and S, j is 1, 2.

s68, establishing a multi-objective optimization model:

wherein, g₁(L_p)、g₂(t_Tp) And g₃(A_nt,T_cur) Respectively representing three optimization objective functions of network interconnection, network duration and network connection matrix perturbation, wherein the constraint condition C1 is the visibility constraint of laser communication links between satellites, namely the available state matrix L corresponding to each laser communication link_inkThe value of the element in (1) must be 1; constraint C2 indicates that the topological network must be connected; constraint C3 indicates that the number of laser communication links established simultaneously by all antennas must be less than the beam number limit, where,represents a to A_ntSumming all elements in the column vector according to rows to obtain a column vector, wherein the ith element in the column vector represents the number of the laser communication links currently established by the ith antenna;

s701, constructing an implementation framework of a double-layer deep reinforcement learning algorithm, wherein the implementation framework comprises an inner-layer environment, an outer-layer environment, an inner-layer experience pool, an outer-layer experience pool, an inner-layer agent and an outer-layer agent, the outer-layer environment is used for simulating a topological structure state of a topological network of a distributed constellation system, the outer-layer agent is used for extracting information from the outer-layer environment to obtain an outer-layer state, the inner-layer environment is used for simulating an interconnection state of the topological network of the distributed constellation system, the inner-layer agent is used for extracting information from the inner-layer environment to obtain an inner-layer state, description variables of the outer-layer environment comprise an available state matrix of a laser communication link of the distributed constellation system, actions of the outer-layer agent are used for selecting an objective function optimization task for the inner-layer agent, and parameters of the objective function optimization task comprise algebraic weighting connectivity of the topological network of the distributed constellation system, The duration and the perturbation value combination of the connection matrix, the description variable of the outer layer environment is obtained through a topological structure after each networking of a topological network of the distributed constellation system, the description variable of the inner layer environment is a networking reconstruction matrix, the action of the inner layer intelligent agent is to establish a laser communication link between two satellites of the distributed constellation system, and the state variable of the inner layer is a networking reconstruction matrix obtained in the middle process of solving the multi-objective optimization model by using the double-layer deep reinforcement learning algorithm. The inner layer experience pool and the outer layer experience pool are used for storing inner layer experience and outer layer experience respectively; the implementation framework of the double-layer deep reinforcement learning algorithm is used for implementing the double-layer deep reinforcement learning algorithm;

s704, determining whether the outer layer state variable is in the end state, if so, the loop is loop +1, and going to step S703, otherwise, going to step S705;

s705, the outer-layer agent selects an objective function optimization task for the inner-layer agent according to the outer-layer state of the double-layer depth reinforcement learning algorithm;

s706, the inner layer agent optimizes the task according to the selected objective function and initializes the inner layer state;

s708, the inner agent calculates an inner reward Botr, and the calculation formula is as follows:

s710, the outer-layer agent obtains a final inner-layer state, namely a final networking reconstruction matrix, as a networking result of the distributed constellation system;

The invention has been described in detail with reference to the drawings, but it will be understood by those skilled in the art that the description is for purposes of illustration and that the invention is defined by the claims, and any modifications, equivalents, improvements and the like based on the claims are intended to be included within the scope of the invention.

17页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种高低频辅助的无人机网络覆盖增强方法

Distributed type star group dynamic networking method based on deep reinforcement learning

相关技术

网友询问留言