Base station precoding and intelligent reflection surface phase shift joint optimization method based on deep reinforcement learning

文档序号：195389 发布日期：2021-11-02 浏览：46次中文

阅读说明：本技术 一种基于深度强化学习的基站预编码与智能反射表面相移联合优化方法 (Base station precoding and intelligent reflection surface phase shift joint optimization method based on deep reinforcement learning ) 是由任红潘存华寇周斌于 2021-07-29 设计创作，主要内容包括：本发明公开了一种基于深度强化学习的基站预编码与智能反射表面相移联合优化方法,包括：基站端发送导频给单天线用户,获取角度信息、位置信息和统计信道状态信息；基站基于获取的信息离线生成大量完整信道矩阵,作为深度强化学习算法参数训练的数据集；根据生成的数据集,基站对预先设定的深度强化学习算法中的神经网络参数进行迭代更新,并对基站预编码和智能反射表面相移矩阵进行联合优化配置,使得该场景下的最小用户遍历速率达到最大化；当下一次角度或位置信息发生变化,基站再次利用深度强化学习进行基站预编码与智能反射表面相移的联合优化配置。(The invention discloses a base station precoding and intelligent reflecting surface phase shift joint optimization method based on deep reinforcement learning, which comprises the following steps: a base station end sends pilot frequency to a single-antenna user to acquire angle information, position information and statistical channel state information; the base station generates a large number of complete channel matrixes in an off-line mode based on the acquired information and uses the complete channel matrixes as a data set for deep reinforcement learning algorithm parameter training; according to the generated data set, the base station iteratively updates neural network parameters in a preset depth reinforcement learning algorithm, and performs joint optimization configuration on base station precoding and an intelligent reflection surface phase shift matrix, so that the minimum user traversal rate in the scene is maximized; and when the next time the angle or position information changes, the base station performs the joint optimization configuration of the base station precoding and the intelligent reflecting surface phase shift by using the deep reinforcement learning again.)

1. A base station precoding and intelligent reflecting surface phase shift joint optimization method based on deep reinforcement learning is characterized by comprising the following steps:

s1, configuring M antennas by a base station, configuring N programmable transmitting units on an intelligent transmitting surface, and configuring a single receiving antenna by a user; when the position information or the angle information of the user is changed, the base station acquires the angle information, the position information and the statistical channel information, wherein,

the angle information includes: departure angle from base station to intelligent transmitting surface signalAngle of arrival of signal from base station to intelligent reflective surfaceDeparture angle from intelligent reflective surface to kth user signalDeparture angle from base station to k-th user signal

The position information is three-dimensional coordinates of K users;

the statistical channel information includes: rice factor alpha of base station and intelligent reflection surface channel, rice factor beta of channel between base station and user k_kLeise factor γ for the channel between the intelligent reflective surface and user k_k；

Step S2, the base station performs offline calculation to generate a plurality of complete wireless channel matrices, and uses the plurality of complete wireless channel matrices as a data set for performing offline training by using a deep reinforcement learning algorithm, wherein the step of generating the plurality of complete wireless channel matrices by the base station offline calculation specifically includes:

step S201, utilizing the angle information obtained in step S1 to respectively calculate the channel line-of-sight components between the base station and the intelligent reflection surfaceChannel line-of-sight component between base station and user kChannel line-of-sight component between intelligent reflective surface and user kWherein a is_x(θ)＝[1,e^jθ,…,e^j(x-1)θ]^T，x＝M,N，[·]^TRepresenting transposing of matrix;

step S202, respectively calculating non-line-of-sight components of channels between the base station and the intelligent reflection surfaceChannel non line-of-sight component between base station and user kAnd the non-line-of-sight component of the channel between the intelligent reflecting surface and user kWherein the content of the first and second substances,andthe elements in the vector are randomly generated and are subjected to complex Gaussian distribution of zero mean unit variance;

step S203, respectively calculating the distances d from the base station and the intelligent transmitting surface to the user K according to the position information of the K users_kAnd D_kAnd the distance from the base station to the intelligent transmitting surface is kept as d₀And therefore the path loss from the base station to the intelligent reflective surfacePath loss between base station to kth userPath loss between intelligent reflective surface to kth userWherein PL₀Is at a reference distance dis₀Path loss at 1 meter;

step S204, calculating three sets of channel matrices respectively, specifically including:

channel matrix between base station and intelligent reflecting surface

Channel matrix between base station and user k

Channel matrix between intelligent reflective surface and user k

Step S3, the base station performs off-line training by using the data set obtained in the step S2, and continuously updates the neural network parameters of the deep reinforcement learning algorithm, so that the base station precoding matrix and the intelligent reflection surface phase shift matrix output by the deep reinforcement learning algorithm realize the optimized convergence of the reference reward value of the deep reinforcement learning model, and stores the base station precoding matrix W and the intelligent reflection surface phase shift matrix phi output under the optimized convergence, wherein during the off-line training, the training process of each round comprises the following steps:

step S301, extracting a group of complete wireless channel matrix data generated in step S2 in order;

step S302, initializing cycle number i ← 0, and initializing reflection phase of smart reflection surfaceConstructing an intelligent reflective surface initial phase shift matrixInitializing base station antenna precoding matrix W⁽⁰⁾Setting the maximum number of single-round circulation;

step S303, calculating initial transmission rate of each userExtracting the minimum value of all user transmission rates as the reward function value r of the current cycle⁽⁰⁾WhereinRepresenting the initial instantaneous signal-to-interference-and-noise ratio of user k,whereinAndrespectively representing base station antenna precoding matrices W⁽⁰⁾The k-th and j-th column vectors,representing the additive white Gaussian noise variance of the k end of the user;

step S304, the complete channel matrix G₀，g_kAnd h_kAnd the intelligent reflective surface initial phase shift matrix phi⁽⁰⁾Base station antenna precoding matrix W⁽⁰⁾As input of the neural network, the intelligent reflecting surface phase shift matrix phi in the next training⁽¹⁾And base station antenna precoding matrix W⁽¹⁾As an output of the neural network;

step S305, judging the circulation termination condition, if the circulation times are less than the maximum circulation times, repeating the following operations, otherwise, turning to the step S309:

step S306, outputting the intelligent reflecting surface phase shift matrix phi based on the last recurrent neural network⁽ⁱ⁾And base station antenna precoding matrix W⁽ⁱ⁾Recalculating transmission rates for users in the systemExtracting the minimum value of all user transmission rates as the reward function value r of the current cycle⁽ⁱ⁾WhereinRepresenting the instantaneous signal-to-interference-and-noise ratio of user k, wherein,andrepresenting base station antenna precoding matrix W⁽ⁱ⁾The kth and jth column vectors of (1);

step S307, updating the intelligent reflecting surface phase shift matrix input by the neural network to phi⁽ⁱ⁾The base station antenna precoding matrix is W⁽ⁱ⁾Obtaining the output parameter phi of the next cycle⁽ⁱ⁺¹⁾And W⁽ⁱ⁺¹⁾；

Step S308, update cycle number i ← i +1, and go to step S305;

step S309, averaging the reward values of all cycles to be used as a reference reward value for the training of the current round;

step S4, the base station transmits the intelligent reflecting surface phase shift matrix phi obtained in the step S3 to the control end of the intelligent reflecting surface through a direct link between the base station and the control end of the intelligent reflecting surface, and performs corresponding configuration;

step S5, when the angle information or the position information is changed, the system re-executes step S1 to step S5.

2. The method for jointly optimizing base station precoding and intelligent reflective surface phase shift based on deep reinforcement learning according to claim 1, wherein in the step S3, the neural network parameters continuously updated by the deep reinforcement learning algorithm specifically include: state parameters, action parameters and reward functions; wherein the content of the first and second substances,

the action parameters are parameters output by the deep neural network and comprise a real part and an imaginary part of a base station antenna precoding matrix W and an intelligent reflecting surface phase shift matrix phi;

the state parameters include the complete channel matrix G set for each training round₀，g_kAnd h_kThe real part and the imaginary part of the base station antenna precoding matrix W and the real part and the imaginary part of the intelligent reflecting surface phase shift matrix phi output by the neural network at the last time;

the reward function is the minimum user instantaneous transmission rate calculated based on the action parameters output by the last neural network.

3. The method as claimed in claim 1, wherein in step S3, the reflection phase parameter of the intelligent reflection surface phase shift matrix satisfies the condition that θ is greater than or equal to 0_n< 2 pi, N ═ 1,2, …, N; base station precoding matrix should satisfy power constraintWherein P is_tRepresents the maximum transmission power value of the base station,denotes the expectation, tr {. denotes the trace of the matrix, [. C {. H { } denotes the trace of the matrix]^HIndicating the conjugate of the matrix.

Technical Field

The invention relates to the technical field of wireless communication, in particular to a base station precoding and intelligent reflecting surface phase shift joint optimization method based on deep reinforcement learning.

Background

When the transmission performance of a traditional wireless communication system is optimized, a transmitter and a receiver are mainly designed and optimized, and because the transmitter and the receiver cannot control the wireless transmission environment in a channel, the propagation environment between the transceivers is taken as an external factor of the transmission system, and only passive adaptation is available but active reconstruction is impossible. Recently, thanks to the breakthrough progress of the novel artificial electromagnetic material technology, the proposed intelligent reflection surface technology provides a feasible technical means for the wireless communication system to actively adjust the propagation environment and realize the programmable wireless environment.

In order to obtain the optimization of the transmission performance of the wireless communication system under the intelligent reflection surface, the joint optimization design of the base station antenna precoding matrix and the intelligent reflection surface phase shift matrix is required. Most of the existing documents are designed for transmission schemes for intelligent reflective surface auxiliary communication based on instantaneous channel state information, that is, it is assumed that system channel information is re-estimated within the time scale of each instantaneous channel state information, and then parameter configuration of a base station and an intelligent reflective surface is performed based on accurate channel state information, so as to realize optimal performance transmission under each instantaneous scale. Although this method can achieve good transmission performance, there are three disadvantages: (1) channel re-estimation is performed within each very short instantaneous time scale, which results in a very large channel estimation overhead; (2) the instantaneous first arrival information estimated each time is subjected to the calculation configuration of the parameters of the base station and the intelligent reflection surface, so that the calculation complexity of system implementation is greatly improved; (3) after each time of updating the parameters of the intelligent reflecting surface, the base station needs to feed back the configuration parameters to the control end of the intelligent reflecting surface, which results in high phase feedback overhead.

The design of the transmission scheme of the intelligent reflection surface is carried out by utilizing the statistical channel state information, the estimation of the statistical channel state information is carried out once in each long time scale, and the combined optimization configuration of the base station and the reflection-only surface parameters is carried out once, so that the three defects based on the instantaneous channel state information can be overcome. However, there is still a challenge in designing a transmission scheme based on statistical channel state information, that is, solving an optimization problem under the statistical channel state information requires an expectation on a small-scale portion in a channel, and a closed expression of a system transmission rate cannot be obtained in general.

Disclosure of Invention

In view of this, an object of the present invention is to provide a base station precoding and intelligent reflection surface phase shift joint optimization method based on deep reinforcement learning, which solves the transmission rate optimization problem based on statistical channel state information by using a deep reinforcement learning algorithm, avoids a complex rate derivation process, solves the optimization of the minimum user transmission rate in a multi-user scenario by using a deep reinforcement learning algorithm, implements user transmission rate optimization design, and greatly reduces the channel estimation overhead and the computation complexity of a system compared with the case based on instantaneous channel state information.

In order to achieve the purpose, the invention adopts the following technical scheme:

a base station precoding and intelligent reflecting surface phase shift joint optimization method based on deep reinforcement learning comprises the following steps:

The position information is three-dimensional coordinates of K users;

Step S204, calculating three sets of channel matrices respectively, specifically including:

channel matrix between base station and intelligent reflecting surface

Channel matrix between base station and user k

Channel matrix between intelligent reflective surface and user k

step S301, extracting a group of complete wireless channel matrix data generated in step S2 in order;

step S303, calculating the initial transmission rate of each user in the systemExtracting the minimum value of all user transmission rates as the reward function value r of the current cycle⁽⁰⁾WhereinRepresenting the initial instantaneous signal-to-interference-and-noise ratio of user k, whereinAndrespectively representing base station antenna precoding matrices W⁽⁰⁾In the k-th and j-th column directionsThe amount of the compound (A) is,representing the additive white gaussian noise variance at the user's k-terminal.

Step S304, the complete channel matrix G₀，g_kAnd h_kAnd the intelligent reflective surface phase shift matrix phi⁽⁰⁾Base station antenna precoding matrix W⁽⁰⁾As input of the neural network, the intelligent reflecting surface phase shift matrix phi in the next training⁽¹⁾And base station antenna precoding matrix W⁽¹⁾As an output of the neural network;

Step S308, update cycle number i ← i +1, and go to step S305;

step S309, averaging the reward values of all cycles to be used as a reference reward value for the training of the current round;

step S5, when the angle information or the position information is changed, the system re-executes step S1 to step S5.

Further, in step S3, the neural network parameters continuously updated by the deep reinforcement learning algorithm specifically include: state parameters, action parameters and reward functions; wherein the content of the first and second substances,

the reward function is the minimum user instantaneous transmission rate calculated based on the action parameters output by the last neural network.

Further, in step S3, the reflection phase parameter of the intelligent reflection surface phase shift matrix should satisfy 0 ≦ θ_n< 2 pi, N ═ 1,2, …, N; base station precoding matrix should satisfy power constraintWherein P is_tRepresents the maximum transmission power value of the base station,denotes the expectation, tr {. denotes the trace of the matrix, [. C {. H { } denotes the trace of the matrix]^HIndicating the conjugate of the matrix.

The invention has the beneficial effects that:

1. the invention adopts the statistical channel state information, the user position information and the angle information which are kept unchanged for a long time to carry out the joint optimization of the base station precoding and the intelligent reflection phase shift configuration, and compared with the method based on the instantaneous channel state information, the method reduces the pilot frequency overhead of the system in the transmission process, the complexity of the system calculation and the phase feedback overhead of the system to the intelligent reflection surface controller, and can realize the optimization of the user traversal rate.

2. The invention solves the rate optimization problem under the statistical channel state information by adopting a deep reinforcement learning algorithm, avoids complex mathematical derivation and calculation, and can quickly realize the joint optimization configuration of the base station precoding and the intelligent reflection phase shift.

3. According to the invention, a deep reinforcement learning algorithm is adopted for transmission scheme design, and the characteristic that statistical channel state information remains unchanged for a long time is ingeniously combined, so that the time overhead of deep reinforcement learning algorithm training can be compatible with a long time scale.

Drawings

Fig. 1 is a flowchart of a base station precoding and intelligent reflective surface phase shift joint optimization method based on deep reinforcement learning provided in embodiment 1;

fig. 2 is a comparison of the optimization method provided in example 1 with the method based on instantaneous channel feedback information on a time scale.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

Referring to fig. 1 and fig. 2, in the present embodiment, a base station precoding and intelligent reflective surface phase shift joint optimization method based on deep reinforcement learning is provided, where for an intelligent reflective surface assisted multi-user MISO wireless transmission system, a base station is configured with M antennas, an intelligent reflective surface is configured with N programmable transmit elements, and a user configures a single receive antenna. Based on statistical channel state information, user position and angle information in the system, a deep reinforcement learning algorithm is utilized to perform joint optimization design on a base station precoding matrix and an intelligent reflection phase shift matrix, base station precoding and intelligent reflection surface phase shift configuration are only needed to be performed once within a long time scale, and data transmission is only needed in the rest time gaps.

Specifically, the method comprises the following steps:

in an intelligent reflective surface assisted, multi-user MISO wireless transmission system, a Rice channel model is used to model the system channels, the positions of the base station and the intelligent transmitting surface are known, and the direct path from the base station to the user and the cascade path from the base station to the user via the intelligent reflective surface are considered. The specific implementation process is as follows:

step 1, a base station is configured with M antennas, an intelligent transmitting surface is configured with N programmable transmitting units, and a user is configured with a single receiving antenna; when the position information or the angle information of a user changes, a base station acquires the angle information, the position information and statistical channel information;

the position information is three-dimensional coordinates of the K users. The position information of the base station and the intelligent reflecting surface is kept unchanged.

The angle information includes: departure angle from base station to intelligent transmitting surface signalAngle of arrival of signal from base station to intelligent reflective surfaceDeparture angle from intelligent reflective surface to kth user signalBase station toDeparture angles of k user signals

The statistical channel information includes: rice factor alpha of base station and intelligent reflection surface channel, rice factor beta of channel between base station and user k_kLeise factor γ for the channel between the intelligent reflective surface and user k_k。

Step 2, the base station performs offline calculation to generate a large number of complete wireless channel matrixes which are used as a data set for performing offline training by using a deep reinforcement learning algorithm;

the process of calculating a large number of complete radio channel matrices comprises the following sub-steps:

a1) respectively calculating the channel line-of-sight components between the base station and the intelligent reflecting surface by using the angle information acquired in the step 1Channel line-of-sight component between base station and user kChannel line-of-sight component between intelligent reflective surface and user kWherein a is_x(θ)＝[1,e^jθ,…,e^j(x-1)θ]^T，x＝M,N，[·]^TRepresenting the transpose of the matrix.

a2) Respectively calculating non-line-of-sight components of channels between base station and intelligent reflecting surfaceChannel non line-of-sight component between base station and user kAnd the non-line-of-sight component of the channel between the intelligent reflecting surface and user k Andthe elements in the vector are randomly generated and are subjected to complex Gaussian distribution of zero mean unit variance;

a3) respectively calculating the distances d from the base station and the intelligent transmitting surface to the user K according to the position information of the K users_kAnd D_k. The distance from the base station to the intelligent transmitting surface is kept as d₀. Therefore, the temperature of the molten metal is controlled,

path loss from base station to intelligent reflecting surface

Path loss between base station to kth user

Path loss between intelligent reflective surface to kth userWherein PL₀Is in dis₀Path loss of 1 meter;

a4) three sets of channel matrices are calculated respectively:

channel matrix between base station and intelligent reflecting surface

Channel matrix between base station and user k

Channel matrix between intelligent reflective surface and user k

And 3, the base station performs off-line training by using the data set obtained in the step 2, continuously updates the neural network parameters of the deep reinforcement learning algorithm, enables the base station precoding matrix and the intelligent reflection surface phase shift matrix output by the deep reinforcement learning algorithm to realize the optimal convergence of the reference reward value of the deep reinforcement learning model, and stores the base station precoding matrix W and the intelligent reflection surface phase shift matrix phi output under the optimal convergence.

The training process for each round of off-line training includes the following sub-steps:

b1) extracting a group of complete channel matrix data generated in the step 2 in sequence;

b2) number of initialization cycles i ← 0, initialization of reflection phase of smart reflection surfaceConstructing an intelligent reflective surface initial phase shift matrixInitializing base station antenna precoding matrix W⁽⁰⁾Setting the maximum number of single-round circulation;

b3) calculating initial transmission rate of each user in systemExtracting the minimum value of all user transmission rates as the reward function value r of the current cycle⁽⁰⁾WhereinRepresenting the initial instantaneous signal-to-interference-and-noise ratio of user k, whereinAndrespectively representing base station antenna precoding matrices W⁽⁰⁾The k-th and j-th column vectors,representing the additive white gaussian noise variance at the user's k-terminal.

b4) The complete channel matrix G₀，g_kAnd h_kAnd the intelligent reflective surface phase shift matrix phi⁽⁰⁾Base station antenna precoding matrix W⁽⁰⁾As input of the neural network, the intelligent reflecting surface phase shift matrix phi in the next training⁽¹⁾And base station antenna precoding matrix W⁽¹⁾As an output of the neural network;

b5) judging a loop termination condition, if the loop number is less than the maximum loop number, repeating the following operation, and otherwise, turning to the step b 9):

b6) intelligent reflecting surface phase shift matrix phi based on last cyclic neural network output⁽ⁱ⁾And base station antenna precoding matrix W⁽ⁱ⁾Recalculating transmission rates for users in the systemExtracting the minimum value of all user transmission rates as the reward function value r of the current cycle⁽ⁱ⁾WhereinRepresenting the instantaneous signal-to-interference-and-noise ratio of user k, wherein,andrepresenting base station antenna precoding matrix W⁽ⁱ⁾The kth and jth column vectors of (1);

b7) updating the intelligent reflecting surface phase shift matrix of the neural network input to phi⁽ⁱ⁾The base station antenna precoding matrix is W⁽ⁱ⁾Obtaining the output parameter phi of the next cycle⁽ⁱ⁺¹⁾And W⁽ⁱ⁺¹⁾；

b8) Update cycle number i ← i +1 and go to step b 5);

b9) the reward values of all cycles are averaged to serve as the reference reward value for the current round of training.

Step 4, the base station transmits the intelligent reflecting surface matrix parameter intelligent reflecting surface phase shift matrix phi obtained in the step 3 to the control end of the intelligent reflecting surface through a direct link between the base station and the control end of the intelligent reflecting surface, and performs corresponding configuration;

and 5, when the angle information or the position information is changed, the system repeats the steps 1 to 5.

Specifically, in this embodiment, the parameters of the deep reinforcement learning algorithm in the training process in step 3 include: state parameters, action parameters and reward functions; the action parameters are parameters output by the deep neural network and comprise a real part and an imaginary part of a base station antenna precoding matrix W and an intelligent reflecting surface phase shift matrix phi; the state parameters include the complete channel matrix G set for each training round₀，g_kAnd h_kThe real part and the imaginary part of the base station antenna precoding matrix W and the real part and the imaginary part of the intelligent reflecting surface phase shift matrix phi output by the neural network at the last time; and the reward function is the minimum user instantaneous transmission rate calculated according to a formula under the action parameters output by the last neural network.

Specifically, in this embodiment, the reflection phase parameter of the phase shift matrix of the intelligent reflection surface in step 3 should satisfy 0 ≦ θ_n< 2 pi, N ═ 1,2, …, N; base station precoding matrix should satisfy power constraintWherein P is_tRepresents the maximum transmission power value of the base station,denotes the expectation, tr {. denotes the trace of the matrix, [. C {. H { } denotes the trace of the matrix]^HRepresenting the conjugate transpose of the matrix.

Iterative updating of neural network parameters is performed by using a depth certainty strategy gradient algorithm, so that the optimal convergence of the minimum user rate in the statistical sense can be realized, and the relationship between the number of elements of the intelligent reflective surface and the minimum user transmission rate is drawn as shown in fig. 2.

In summary, the method of the present invention is based on statistical channel state information, and can significantly reduce the channel estimation overhead and implementation complexity in the system transmission process. In addition, the method uses a deep reinforcement learning algorithm to carry out the joint optimization design of base station precoding and intelligent reflecting surface phase shift, can be matched with the time scale of long-time channel state information updating, and ensures that the transmission stability of a multi-user system is optimal under the long-time statistical significance.

The invention is not described in detail, but is well known to those skilled in the art.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

13页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于可重构全息超表面的混合预编码方法及系统

Base station precoding and intelligent reflection surface phase shift joint optimization method based on deep reinforcement learning

相关技术

网友询问留言