Cellular mobile communication system cooperative signal sending method based on reinforcement learning

文档序号：195392 发布日期：2021-11-02 浏览：37次中文

阅读说明：本技术 一种基于强化学习的蜂窝移动通信系统协作式信号发送方法 (Cellular mobile communication system cooperative signal sending method based on reinforcement learning ) 是由梁应敞贾浩楠何振清于 2021-08-13 设计创作，主要内容包括：本发明公开了一种基于强化学习的蜂窝移动通信系统协作式信号发送方法,包括以下步骤：(1)在基站发射端,每个基站首先收集本基站下用户的干扰信息和等效信道信息,并将这些信息以及上一个时刻的各个用户分配得到功率信息发送给其他基站；(2)每个基站根据本地用户的信道信息确定各个用户的波束方向；(3)根据其他基站交互的信息通过训练好的强化学习神经网络中,神经网络经过运算后输出该基站下给每个用户分配的功率；(4)每个基站就根据波束方向和功率来生成波束赋形向量,并用该波束赋形向量为发送信号做处理。本发明适用于配置了大规模天线阵列的移动蜂窝网络,能提高整个蜂窝网络的总传输速率。(The invention discloses a collaborative signal sending method of a cellular mobile communication system based on reinforcement learning, which comprises the following steps: (1) at a transmitting end of a base station, each base station firstly collects interference information and equivalent channel information of users under the base station, distributes the information and each user at the previous moment to obtain power information and sends the power information to other base stations; (2) each base station determines the beam direction of each user according to the channel information of the local user; (3) according to the interactive information of other base stations, in a trained reinforcement learning neural network, the neural network outputs the power distributed to each user by the base station after operation; (4) each base station generates a beamforming vector according to the beam direction and the power, and processes the transmission signal by using the beamforming vector. The invention is suitable for the mobile cellular network configured with large-scale antenna arrays, and can improve the total transmission rate of the whole cellular network.)

1. A method for cooperative signaling in a cellular mobile communication system based on reinforcement learning, comprising the steps of:

(1) at a transmitting end of a base station, each base station firstly collects interference information and equivalent channel information of users under the base station, distributes the information and each user at the previous moment to obtain power information and sends the power information to other base stations;

(2) each base station determines the beam direction of each user according to the channel information of the local user;

(3) according to the interactive information of other base stations, in a trained reinforcement learning neural network, the neural network outputs the power distributed to each user by the base station after operation;

(4) each base station generates a beamforming vector according to the beam direction and the power, and processes the transmission signal by using the beamforming vector.

2. The reinforcement-learning-based cooperative signaling method for cellular mobile communication system according to claim 1, wherein the antenna array of the base station in step (1) is a uniform rectangular array with a total of N²An antenna.

3. The reinforcement learning-based cellular mobile communication system cooperative signaling method according to claim 1, wherein the base station-to-user channel consists of two parts: large scale fading and small scale fading.

4. The reinforcement-learning-based cooperative signaling method for cellular mobile communication system according to claim 1, wherein the channel from the (x, y) th antenna of the ith base station to the kth user under the jth base station in the network of step (3) is represented asWherein the large scale fading is pathloss of 28.0+22lg D +20lg f_cD is the physical distance from the user to the base station, f_cIs the operating carrier frequency; when user k under the jth base station is in the range of sector m of base station i, S_m(θ) ≡ 1, otherwise S_m(θ) ≡ 0; p is the number of propagation multipaths, g_i,j,k,pFor the small-scale fading of each path, the small-scale fading is assumed to be independent and uniformly distributed random variables, namely g-CN (0,1), which is complex Gaussian distribution with the mean value of the random variables being 0 and the variance being 1; d is the distance between the antennas and is, carrying pitch and azimuth information of the transmission path.

5. The reinforcement-learning-based cooperative signaling method for cellular mobile communication system according to claim 4, wherein in the channel condition, the signal received by the kth user at the jth base station can be represented as:

wherein, the first term in the right formula is the signal needed by the kth user under the jth base station; the second term is interference caused by signals sent by the jth base station to other users to the user k, and is also called intra-cell interference; the third term is interference caused by signals transmitted by other base stations to the kth user under the jth base station, which is also called inter-cell interference; the last term is the receiver system noise for that user.

6. The method for cooperative signaling in cellular mobile communication system based on reinforcement learning of claim 1, wherein the whole neural network in step (3) has a work flow divided into two stages, an offline training stage and an online decision stage; in an online decision stage, the neural network only needs to output actions by the online decision network, and then the state conversion process is stored in the experience playback unit; in the off-line training stage, in each training, a batch of data is taken from the experience playback unit and is respectively input into the target decision network and the target Q value network, the former outputs the action strategy taken in each state, and the latter outputs the value y of the action strategy in each state_i＝r_i+γQ'(s_i+1,μ'(s_i+1|θ^μ′)|θ^Q′)。

7. The reinforcement learning-based cellular mobile communication system-based cooperative signaling method according to claim 6, wherein the neural network is composed of an input layer, a hidden layer and an output layer.

8. The reinforcement-learning-based cellular mobile communication system cooperative signaling method according to claim 7, wherein the activation function of the hidden layer is a linear rectification function, and the expression is f (x) max (0, x).

9. The reinforcement-learning-based cellular mobile communication system cooperative signaling method of claim 7, wherein the output layer selects a softmax function for output vector normalization, the expression of which is

Technical Field

The invention belongs to the field of wireless communication, and particularly relates to a collaborative signal sending method of a cellular mobile communication system based on reinforcement learning.

Background

Cellular Mobile Communication (Cellular Mobile Communication) is currently the most dominant wireless Communication system in the world. With the development of mobile communication technology, cell areas are being developed to be dense, distances among the cell areas are gradually shortened, and interference among cells with the same frequency becomes a main problem affecting communication quality. The conventional cooperative solution firstly requires a large amount of Channel State Information (CSI) to be interacted between base stations, and then each base station independently designs a Beamforming (Beamforming) scheme to avoid inter-cell interference as much as possible. However, the existing base stations often have large-scale antenna arrays, and the amount of CSI information required to be interacted between the base stations is quite large, so that the scheme is not easy to implement.

Disclosure of Invention

The invention aims to provide a collaborative signal transmission method of a cellular mobile communication system based on reinforcement learning aiming at the problem of reducing the interference among cells with the same frequency, so that the interference among the cells can be avoided only by interacting less information among the cells.

In order to solve the technical problems, the invention adopts the following technical scheme:

a cooperative signal transmission method of a cellular mobile communication system based on reinforcement learning comprises the following steps:

(2) each base station determines the beam direction of each user according to the channel information of the local user;

(4) each base station generates a beamforming vector according to the beam direction and the power, and processes the transmission signal by using the beamforming vector.

Further, the antenna array of the base station in the step (1) is a uniform rectangular array with a total of N²An antenna.

Further, the base station to user channel consists of two parts: large scale fading and small scale fading.

Further, in the network of step (3), the channel from the (x, y) th antenna of the ith base station to the kth user under the jth base station can be represented asWherein the large scale fading is pathloss of 28.0+22lgD +20lgf_cD is the physical distance from the user to the base station, f_cIs the operating carrier frequency; when user k under the jth base station is in the range of sector m of base station i, S_m(θ) ≡ 1, otherwise S_m(θ) ≡ 0; p is the number of propagation multipaths, g_i,j,k,pFor the small-scale fading of each path, the small-scale fading is assumed to be independent and uniformly distributed random variables, namely g-CN (0,1), which is complex Gaussian distribution with the mean value of the random variables being 0 and the variance being 1; d is the distance between the antennas and is,andcarrying pitch and azimuth information of the transmission path.

Further, in the channel condition, the signal received by the kth user at the jth base station may be represented as:

Further, the work flow of the whole neural network in the step (3) is divided into two stages, namely an off-line training stage and an on-line decision stage; in an online decision stage, the neural network only needs to output actions by the online decision network, and then the state conversion process is stored in the experience playback unit; in the off-line training stage, in each training, a batch of data is taken from the experience playback unit and is respectively input into the target decision network and the target Q value network, the former outputs the action strategy taken in each state, and the latter outputs the value of the action strategy in each state

y_i＝r_i+γQ'(s_i+1,μ'(s_i+1|θ^μ')|θ^Q')。

Furthermore, the neural network is composed of an input layer, a hidden layer and an output layer.

Further, the activation function of the hidden layer is a linear rectification function, and its expression is f (x) max (0, x).

Further, the output layer selects a softmax function for output vector normalization, the expression of which is

The invention has the following beneficial effects:

at the transmitting end of the base station, each base station firstly collects the interference information and equivalent channel information of users under the base station, distributes the information and each user at the previous moment to obtain power information and sends the power information to other base stations. And then, each base station determines the beam direction of each user according to the channel information of the local user, and then outputs the power distributed to each user by the base station after operation in a trained reinforcement learning neural network according to the information interacted by other base stations. Thus, each base station generates a beam forming vector according to the beam direction and the power, and processes the transmission signal by using the beam forming vector.

The difference between the method and the traditional method is that the information quantity of the information needed to be interacted between the base stations is far lower than that of the traditional scheme, the information quantity of the interaction is irrelevant to the number of the base station antennas, the method and the device are suitable for a mobile cellular network provided with a large-scale antenna array, and the total transmission rate of the whole cellular network can be improved.

In addition, the invention does not need to exchange a large amount of channel information between base stations to design the beam forming vector, and optimizes the transmission rate of the whole cellular network by distributively designing the beam direction and the beam power.

Drawings

FIG. 1 is a diagram of a cellular communication network system model of the present invention;

FIG. 2 is a base station transmitter operational flow diagram of a cellular communication network of the present invention;

FIG. 3 is a diagram of a cellular network base station transmitter reinforcement learning neural network of the present invention;

FIG. 4 is a diagram of a reinforcement learning neural network of the present invention;

fig. 5 is a comparison graph of the performance of the reinforcement learning-based beamforming algorithm and other distributed algorithms of the present invention.

Detailed Description

The present invention considers the downlink transmission situation of a common multi-cellular mobile communication system, such as the cellular communication network system model shown in fig. 1, for convenience of description, only three cells are drawn in fig. 1, and we actually consider that a cellular network system is composed of L cells, and each cell includes a Base Station (BS) and K User Equipments (UE). Each base station serves only users within its cell range, but interferes with users in other cells while serving its users. In the downlink data transmission process, the base station needs to design a beamforming vector for each user to eliminate intra-cell and inter-cell interference. The invention designs a multi-base-station-assisted beam forming design scheme, as shown in fig. 2, when each base station works, information needed by decision is interacted among the base stations, then the base stations make beam direction decision and beam power decision respectively according to the information, and finally signals are sent according to the decision scheme.

In this cellular network, we assume that the antenna arrays of the base stations are all uniform rectangular arrays, with a total of N²An antenna. The base station to user channel consists of two parts: large scale fading and small scale fading. As shown in FIG. 1, in the network, the channel from the (x, y) th antenna of the ith base station to the kth user under the jth base station can be represented asWherein the large scale fading is pathloss of 28.0+22lgD +20lgf_cD is the physical distance from the user to the base station, f_cIs the operating carrier frequency. When user k under the jth base station is in the range of sector m of base station i, S_m(θ) ≡ 1, otherwise S_m(θ) ≡ 0. P is the number of propagation multipaths, g_i,j,k,pFor the small-scale fading of each path, in the invention, it is assumed that the small-scale fading is independent and uniformly distributed random variables, i.e. g-CN (0,1), which is a complex gaussian distribution with a mean value of the random variables being 0 and a variance being 1. d is the distance between the antennas and is,andboth carry pitch and azimuth information of the transmission path. For convenience of description, we spread all antenna channels into N²Vector h of x 1.

In the above channel case, the signal received by the kth user at the jth base station can be represented as:

wherein the first term in the right formula is the signal required by the kth user under the jth base station. The second term is the interference caused by the signal sent by the jth base station to other users to user k, which is also called intra-cell interference. The third term is interference caused by signals transmitted by other base stations to the kth user under the jth base station, which is also called inter-cell interference. The last term is the receiver system noise for that user. The quality of the Signal received by the user can be described by a Signal to Interference plus Noise Ratio (SINR), and then the SINR of the kth user at the jth base station can be expressed as:

the achievable rate per bandwidth data for the user can be expressed as:

R_j,k＝log₂(1+SINR_j,k)， (3)

fig. 2 is a flow chart of a base station transmitter operation of the cellular communication network of the present invention. In the conventional multi-cell beamforming solution, information interaction is the most bandwidth and time consuming process because all users need to transmit multi-antenna channel information. However, when the number of users is large and the number of base station antennas is large, it is not feasible to transmit the channel information of all users. The information needed to be interacted by the invention only comprises equivalent channel information of each user and interference information of each base station, and the information quantity is far lower than that of the traditional scheme, so that the invention is more practical. After the information interaction, each base station determines the beam direction of each user according to the channel information of the local user. Here, the idea of zero forcing algorithm is used, such thatDetermining the beam direction of each user and then energy normalizing the beamforming vector of each user, i.e.The beam direction decision is thus completed. Then, each base station inputs the information obtained by interaction into a reinforcement learning neural network, and the neural network outputs the power decision eta of each user after operation₁,η₂,···,η_K]. Finally, the base station generates a beam forming vector according to the direction decision and the power decisionAnd sends downlink data to the user.

Fig. 3 is a diagram of a reinforced learning neural network structure of a cellular network base station transmitter according to the present invention. The reinforcement learning method adopted by the invention is a Deep Deterministic Policy Gradient (DDPG). The main body of the neural network is composed of two parts: actor networks and comment networks. The actor network makes a decision through calculation according to the input state vector s and outputs an action vector a, actor network parameters are updated through an optimizer, a decision gradient is calculated, and then the actor network parameters are fed back to the decision network to be updated, and at each period of time, the parameters of the on-line decision network are updated to a target decision network in a soft updating mode. For the comment network, the essential purpose of the comment network is to output the value of the action taken by the decision network, and the definition of the value function can be expressed as Q^μ(s_t,a_t)＝E[r(s_t,a_t)+γQ^μ(s_t+1,μ(s_t+1))]Which is shown in state s_tTake action a_tAfterwards, and if the policy μ is continuously executed. The aim of the invention is to maximize the sum rate of the whole cellular communication network, therefore, the reward parameter for reinforcement learning training is set asNamely network and rate.

The work flow of the whole neural network is divided into two stages, namely an off-line training stage and an on-line decision stage. In the online decision phase, the neural network only needs to output actions by the online decision network, and then stores the state transition process to the experience playback unit. WhileIn the off-line training stage, in each training, a batch of data is taken from the experience playback unit and is respectively input into the target decision network and the target Q value network, the former outputs the action strategy taken in each state, and the latter outputs the value y of the action strategy in each state_i＝r_i+γQ'(s_i+1,μ'(s_i+1|θ^μ')|θ^Q'). Then the online Q value network calculates the output value and y_iThe gradient is calculated and the parameters are updated according to the difference, and the strategy gradient is calculated and the parameters are updated by the online strategy network. In order to allow the neural network to explore new actions and avoid falling into local optimality, the invention adds noise to the actions made by the online decision network, so that the network has the capability of exploring new actions and states.

FIG. 4 is a diagram of the internal structure of the DDPG neural network of the present invention. The neural network is composed of an input layer, a hidden layer and an output layer. The four networks in the invention have similar structures, and only have difference in the number of neurons of the input layer. The activation function of the hidden layer is a Linear rectification function (ReLU), which is expressed by f (x) max (0, x). The output layer selects a softmax function for output vector normalization, and the expression of the softmax function is

In the following, the present invention will illustrate the performance of the proposed solution according to the simulation result. Firstly, the invention selects the most common hexagonal cellular network structure, sets the number of cells L to 3, the cell base station interval is 500 meters, the number of cell sectors is 3, the height of the base station is 25 meters, the height of the user equipment is 1.5 meters, and the carrier frequency f_c3.5GHz, number of base station antennas N²64, the distance between the antennas is lambda/2, and the maximum transmitting power P of the base station_max＝10⁵mW, user noise powerIn the aspect of strengthening learning parameters, the network learning rate is 10^-3The playback memory unit size is 5000, the discount coefficient γ is 0.1, the data batch size is 512, and the number of hidden layer neurons is 400, the neural network algorithms are all implemented using PyTorch.

Fig. 5 is a comparison graph of the performance of the reinforcement learning-based beamforming algorithm and other distributed algorithms of the present invention. The other three comparison algorithms are respectively a distributed Matched Filter (TMF) algorithm, a distributed Zero Forcing (ZF) algorithm and a distributed Zero Gradient (ZG) algorithm, and the number of users set in the simulation graph is K10. Under the same conditions, the performance of the algorithm based on reinforcement learning after convergence can exceed that of other distributed algorithms, and the required parameter quantity is far less than that of the distributed zero-gradient algorithm.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art, having the benefit of this disclosure, may effect numerous modifications thereto and changes may be made without departing from the scope of the invention in its broader aspects.

11页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种微带相控天线的波束切换装置

Cellular mobile communication system cooperative signal sending method based on reinforcement learning

相关技术

网友询问留言