Beam weight parameter adjusting method, device, equipment and storage medium

文档序号：490353 发布日期：2022-01-04 浏览：6次中文

阅读说明：本技术 波束权值参数调整方法、装置、设备及存储介质 (Beam weight parameter adjusting method, device, equipment and storage medium ) 是由陈磊光李诗扬范娟曾昭才邹卫新陈刚吴梓颖冯鹄志肖昀珊陈孟香于 2021-10-27 设计创作，主要内容包括：本申请提供一种波束权值参数调整方法、装置、设备及存储介质。该方法包括：若监测到波束权值参数调整指令,则确定目标楼宇对应的当前各楼层用户数量以及用户在对应的楼层的驻留时长；根据目标楼宇对应的当前各楼层用户数量、用户在对应的楼层的驻留时长及基站信息确定目标楼宇当前状态信息；将目标楼宇当前状态信息输入至训练的强化学习模型中,获得目标楼宇对应的最优波束权值参数组合；根据最优波束权值参数组合对目标楼宇对应的基站的波束权值参数进行调整。本申请的方法,在波束权值参数调整时加入了用户在楼宇中所处楼层以及在楼层的停留时间,使得调整后的基站波束更好地覆盖楼宇,有效地提高下行平均速率。(The application provides a method, a device, equipment and a storage medium for adjusting beam weight parameters. The method comprises the following steps: if a beam weight parameter adjusting instruction is monitored, determining the number of users on each current floor corresponding to the target building and the residence time of the users on the corresponding floor; determining the current state information of the target building according to the number of users on each current floor corresponding to the target building, the residence time of the users on the corresponding floors and the base station information; inputting the current state information of the target building into a trained reinforcement learning model to obtain an optimal beam weight parameter combination corresponding to the target building; and adjusting the beam weight parameter of the base station corresponding to the target building according to the optimal beam weight parameter combination. According to the method, the floor where the user is located in the building and the residence time of the user on the floor are added during the adjustment of the beam weight parameter, so that the adjusted base station beam can better cover the building, and the downlink average rate is effectively improved.)

1. A method for adjusting beam weight parameters, the method comprising:

if a beam weight parameter adjusting instruction is monitored, determining the number of users on each current floor corresponding to the target building and the residence time of the users on the corresponding floor, wherein the beam is a beam transmitted to a user terminal of the target building by a base station;

determining the current state information of the target building according to the number of users on each current floor corresponding to the target building, the residence time of the users on the corresponding floor and the base station information corresponding to the target building;

inputting the current state information of the target building into a trained reinforcement learning model to obtain an optimal beam weight parameter combination corresponding to the target building;

and adjusting the beam weight parameter of the base station corresponding to the target building according to the optimal beam weight parameter combination.

2. The method of claim 1, wherein the determining the number of users on each floor and the residence time of the users on the corresponding floor corresponding to the target building comprises:

acquiring signaling data XDR corresponding to a target building, wherein the XDR comprises a user terminal identification code, a floor identification code corresponding to the user terminal identification code, service starting time corresponding to the user terminal identification code and corresponding service ending time;

and determining the number of users on each floor corresponding to the floor identification code corresponding to the target building according to the floor identification code, and determining the residence time of the user corresponding to the user terminal identification code on the floor corresponding to the current floor identification code according to the service start time corresponding to the user terminal identification code and the corresponding service end time.

3. The method as claimed in claim 2, wherein the determining the current status information of the target building according to the number of users on each floor corresponding to the target building, the residence time of the users on the corresponding floor, and the base station information corresponding to the target building comprises:

acquiring base station information, wherein the base station information comprises corresponding target building information and a corresponding floor identification code;

and associating the number of users on each floor corresponding to the current floor identification code corresponding to the target building and the residence time of the users on the floor corresponding to the current floor identification code with the base station information based on the preset key field to obtain associated information, and determining the associated information as the current state information of the target building.

4. The method of claim 1, wherein the learning reinforcement model comprises a markov model and a neural network model;

before inputting the current state information of the target building into the trained reinforcement learning model and obtaining the optimal beam weight parameter combination corresponding to the target building, the method further comprises the following steps:

constructing a state information set, an action information set and reward and punishment information in a Markov model of a learning reinforcement model;

establishing an experience pool, obtaining a plurality of corresponding experience values based on a state information set, an action information set and reward and punishment information, and adding the plurality of experience values into the established experience pool;

and randomly selecting a plurality of experience values from the experience pool as training samples, and training the neural network model of the learning reinforcement model by adopting the training samples to obtain the trained neural network model of the learning reinforcement model.

5. The method of claim 4, wherein constructing the state information set, the action information set, and the reward and punishment information in the Markov model of the learning-enhanced model comprises:

acquiring target building state information, and constructing a state information set in a Markov model of a learning reinforcement model based on the target building state information;

the method comprises the steps of obtaining a wave beam weight value parameter combination of a base station corresponding to target building state information, constructing an action information set in a Markov model of a learning reinforcement model based on the wave beam weight value parameter combination, and constructing reward and punishment information based on a downlink average speed and a preset user change function.

6. The method of claim 4, wherein the constructing the experience pool, obtaining a plurality of corresponding experience values based on the state information set, the action information set, and the reward and punishment information, and adding the plurality of experience values to the constructed experience pool comprises:

selecting state information and action information from the state information set, executing an action corresponding to the action information, and obtaining corresponding next state information and corresponding reward and punishment information;

and taking the selected state information, the selected action information, the corresponding next state information and the corresponding reward and punishment information as a group of experience values, adding the experience values into a constructed experience pool, and executing the steps of selecting the state information from the state information set and selecting the action information from the action information set until the number of the experience values in the experience pool reaches a preset number.

7. The method of claim 4, wherein training the neural network model of the learning-enhanced model using the training samples to obtain the neural network model of the trained learning-enhanced model comprises:

constructing a pre-estimation network model and a target network model corresponding to the neural network model, and defining a corresponding loss function, wherein the pre-estimation network model is expressed as: q (s, a; θ), the target network model being represented as: ytarget ═ r + γ max Q (s, a; θ), where s is state information, a is action information, θ is a weight parameter, r is reward and punishment information, and the loss function is expressed as: loss ═ Q-Y_target)²；

And inputting the training samples into a pre-estimation network model, and updating the weight parameters of the pre-estimation network model by utilizing the gradient back propagation of the neural network through an optimal loss function so as to obtain the optimal weight parameters and obtain the neural network model of the trained learning reinforcement model.

8. The method of claim 1, wherein after adjusting the beam weight parameters of the base station corresponding to the target building according to the optimal beam weight parameter combination, the method further comprises:

acquiring a first downlink average rate corresponding to the preset time before adjustment and a second downlink average rate corresponding to the preset time after adjustment, and comparing the first downlink average rate with the second downlink average rate;

if the first downlink average rate is less than or equal to the second downlink average rate, sending a prompt message of successful adjustment to the corresponding terminal;

and if the first downlink average rate is greater than the second downlink average rate, sending prompt information of failed adjustment to the corresponding terminal.

9. An apparatus for adjusting beam weight parameter, the apparatus comprising:

the determining unit is used for determining the number of users on each current floor corresponding to the target building and the residence time of the users on the corresponding floor if the beam weight parameter adjusting instruction is monitored, wherein the beam is a beam transmitted to a user terminal of the target building by a base station;

the determining unit is further used for determining the current state information of the target building according to the number of users on each current floor corresponding to the target building, the residence time of the users on the corresponding floor and the base station information corresponding to the target building;

the input unit is used for inputting the current state information of the target building into a trained reinforcement learning model to obtain the optimal beam weight parameter combination corresponding to the target building;

and the adjusting unit is used for adjusting the beam weight parameter of the base station corresponding to the target building according to the optimal beam weight parameter combination.

10. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory to implement the method of any of claims 1-8.

11. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, perform the method of any one of claims 1-8.

Technical Field

The present application relates to communications technologies, and in particular, to a method, an apparatus, a device, and a storage medium for adjusting beam weight parameters.

Background

With the development of wireless communication networks from Long Term Evolution (LTE) networks to 5th generation mobile communication technology (5G) networks, for a 5G multi-channel base station device, a base station may transmit a plurality of beams, and different beams cover different areas, and the 5G base station may improve the spatial coverage performance of the network in a multi-beam scanning manner.

The base station can flexibly set different initial beam configurations according to different coverage scenes, but the coverage scenes in the current network are various, the optimal overall coverage of the building of the cell cannot be ensured only by using the initial beam configurations, and the beam of the base station is adjusted by further obtaining a beam optimized weight according to the current user traffic and the traffic distribution of potential users in the existing adjusting method.

However, the positions of the users in the buildings are different, the positions of the users are not fixed, and the problem that the positions of the users in the buildings are different is not considered only by the method of optimizing the wave velocity according to the user traffic.

Disclosure of Invention

The application provides a method, a device, equipment and a storage medium for adjusting beam weight parameters, which are used for solving the problem that the existing beam adjustment mode does not consider the different positions of users in a cell building.

In a first aspect, the present application provides a method for adjusting beam weight parameters, including:

inputting the current state information of the target building into a trained reinforcement learning model to obtain an optimal beam weight parameter combination corresponding to the target building;

and adjusting the beam weight parameter of the base station corresponding to the target building according to the optimal beam weight parameter combination.

In a second aspect, the present application provides a device for adjusting beam weight parameters, including:

and the adjusting unit is used for adjusting the beam weight parameter of the base station corresponding to the target building according to the optimal beam weight parameter combination.

In a third aspect, the present invention provides an electronic device comprising: a processor and a memory;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory, causing the processor to perform the method of the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon computer-executable instructions for implementing the method according to the first aspect when executed by a processor.

The method, the device, the equipment and the storage medium for adjusting the beam weight parameters, provided by the application, determine the number of current users on each floor of a target building and the residence time of the users on the corresponding floor if monitoring a beam weight adjusting instruction, determine the current state information of the target building according to the number of the current users on each floor, the residence time of the users on the corresponding floor and the base station information, input the state information into a pre-trained reinforcement learning model to obtain the optimal beam weight parameter combination corresponding to the target building, thereby adjusting the beam weight parameters of the base station according to the optimal beam weight parameter combination, adding the floor where the users are located in the building and the residence time on the floor when adjusting the beam weight parameters, and compared with the existing adjusting mode based on user traffic, the method, the device, the equipment and the storage medium for adjusting the beam weight parameters allow the adjusted base station beam to better cover the building, effectively improving the downlink average speed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic diagram of a network architecture of a method for adjusting beam weight parameters according to the present invention;

fig. 2 is a schematic flow chart illustrating a method for adjusting beam weight parameters according to an embodiment of the present invention;

fig. 3 is a schematic flow chart illustrating a method for adjusting beam weight parameters according to a second embodiment of the present invention;

fig. 4 is a schematic flow chart illustrating a method for adjusting beam weight parameters according to a third embodiment of the present invention;

fig. 5 is a schematic flow chart illustrating a method for adjusting beam weight parameters according to a fourth embodiment of the present invention;

fig. 6 is a schematic flow chart illustrating a method for adjusting beam weight parameters according to a fifth embodiment of the present invention;

fig. 7 is a schematic flowchart of a method for adjusting beam weight parameters according to a sixth embodiment of the present invention;

fig. 8 is a schematic structural diagram of a beam weight parameter adjusting apparatus according to an embodiment of the present invention;

fig. 9 is a block diagram of an electronic device for implementing a method for adjusting beam weight parameters according to an embodiment of the present invention.

With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The base station can flexibly set different initial beam configurations according to different coverage scenes, but the coverage scenes in the current network are various, and the optimal overall coverage of the building in the cell cannot be ensured only by using the initial beam configurations, the adopted adjusting mode firstly obtains the service volume distribution of the current user of the target cell according to the throughput distribution of the beam of the target cell, obtaining the service distribution of potential users according to the noise distribution of the target cell wave beam and the throughput distribution of the same-frequency adjacent cell wave beams, according to the service distribution of the users and the service distribution of the potential users, obtaining the service distribution of the users before and after being adjusted according to each weight and obtaining the expected throughput gain of the target cell beam after being adjusted according to each weight, and finally selecting the weight for beam optimization according to the expected throughput gain corresponding to each beam weight to adjust the beam.

In practice, the positions of users in the residential building are different, the positions of the users are not fixed, and the problem that the positions of the users in the residential building are different is not considered in the existing method for optimizing the wave velocity based on the user traffic.

Therefore, aiming at the problem that the beam adjustment mode in the prior art does not consider the different positions of users in the buildings of the cell, the inventor finds in research that two parameters, namely the floor where the users are located in the building and the residence time of the users on the floor, are added during the adjustment of the beam weight parameters, specifically, if a beam weight adjustment instruction is monitored, the current user number of each floor of a target building and the residence time of the users on the corresponding floor are determined, the current state information of the target building is determined according to the current user number of each floor, the residence time of the users on the corresponding floor and the base station information, the state information is input into a pre-trained reinforcement learning model to obtain the optimal beam weight parameter combination corresponding to the target building, so that the beam weight parameters of a base station are adjusted according to the optimal beam weight parameter combination, the floor where the users are located in the building and the residence time of the users on the floor are added during the adjustment of the beam weight parameters, compared with the existing adjustment mode based on user traffic, the adjustment mode is more comprehensive, so that the adjusted base station beam can better cover the building, and the downlink average rate is effectively improved.

Therefore, the inventor proposes a technical scheme of the embodiment of the invention based on the above creative discovery. The following describes a network architecture and an application scenario of the method for adjusting beam weight parameters according to the embodiment of the present invention.

As shown in fig. 1, a network architecture corresponding to the method for adjusting beam weight parameters provided in the embodiment of the present invention includes: the system comprises an electronic device 1 and a base station 2, wherein the electronic device 1 is in communication connection with the base station 2, and the electronic device 1 comprises a server or a gateway device. The base station 2 corresponds to the target building 3, the target building 3 includes a plurality of floors, users are distributed on different floors, and the users use user terminals. The base station 2 transmits a beam to the user terminals of the users in the target building 3. If monitoring a beam weight parameter adjustment instruction, the electronic device 1 determines the number of users on each current floor corresponding to the target building and the residence time of the users on the corresponding floor; determining the current state information of the target building according to the number of users on each current floor corresponding to the target building, the residence time of the users on the corresponding floor and the base station information corresponding to the target building; inputting the current state information of the target building into a trained reinforcement learning model to obtain an optimal beam weight parameter combination corresponding to the target building, generating an adjusting instruction by the electronic equipment 1 based on the optimal beam weight parameter combination, sending the adjusting instruction to the base station 2 by the electronic equipment 1, analyzing the adjusting instruction by the base station 2 to obtain the optimal beam weight parameter combination, updating the current beam weight parameter combination into the optimal beam weight parameter combination by the base station 2, and changing the beam coverage state of the base station on the target building by adjusting the beam weight parameters so as to improve the corresponding downlink average rate. The floor where the user is located in the building and the residence time of the user on the floor are added during the adjustment of the beam weight parameters, and compared with the existing adjustment mode based on the user traffic, the adjustment mode is more comprehensive, so that the adjusted base station beam can better cover the building, and the downlink average rate is effectively improved.

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Example one

Fig. 2 is a schematic flow chart of a method for adjusting beam weight parameters according to an embodiment of the present invention, and as shown in fig. 2, an execution main body of the method for adjusting beam weight parameters according to the present embodiment is a device for adjusting beam weight parameters, where the device for adjusting beam weight parameters is located in an electronic device, and the method for adjusting beam weight parameters according to the present embodiment includes the following steps:

step 101, if a beam weight parameter adjustment instruction is monitored, determining the number of users on each current floor corresponding to a target building and the residence time of the users on the corresponding floor, wherein the beam is a beam transmitted to a user terminal of the target building by a base station.

In this embodiment, whether a beam weight parameter adjustment instruction of a beam is received is monitored, where the beam is a beam transmitted by a base station to a user terminal of a target building, where the beam weight parameter includes Dl256QamSwitch, SsbPeriod, occipidrbnum, dladditiolddmrpsos, MaxMimoLayerNum, nruucellid, and TrsBeamPattern, if the beam weight parameter adjustment instruction is monitored, the number of users on each current floor corresponding to the target building and the residence time of the users on the corresponding floor are further determined, and specifically, signaling data corresponding to the target building is collected, where the signaling data includes a user terminal identification code, a floor identification code corresponding to the user terminal identification code, service start time corresponding to the user terminal identification code, and corresponding service end time. And determining the number of the users on each floor corresponding to the target building and the residence time of the users on the corresponding floor based on the signaling data.

The operation and maintenance personnel can send a beam weight value adjusting parameter instruction through the corresponding terminal, and the beam weight value parameter adjusting device monitors whether the beam weight value adjusting instruction is received or not. Or the beam weight parameter adjusting device is connected with the terminal, and the terminal automatically sends a beam weight parameter adjusting instruction to the beam weight parameter adjusting device at intervals of preset time, wherein the preset time can be set according to actual conditions, for example, the preset time can be set to 1 h. Or a timing unit is arranged in the beam weight parameter adjusting device, the timing unit sends a beam weight parameter adjusting instruction to a receiving unit of the beam weight parameter adjusting device at intervals of preset time, and if the beam weight parameter adjusting instruction is monitored, the number of users on each floor corresponding to the target building and the residence time of the users on the corresponding floor are further determined.

And 102, determining the current state information of the target building according to the number of users on each floor corresponding to the target building, the residence time of the users on the corresponding floor and the base station information corresponding to the target building.

In this embodiment, the base station information includes a base station identifier, a corresponding floor identifier, a base station longitude, a base station latitude, a base station distance, an antenna hanging height, and an antenna downtilt angle. Counting the number of users on each current floor corresponding to the target building and the residence time of the users on the corresponding floor, further generating current state information of the target building according to the number of the users on each current floor, the residence time of the users on the corresponding floor and the base station information, and obtaining a corresponding optimal beam weight parameter combination based on the current state information of the target building.

And 103, inputting the current state information of the target building into the trained reinforcement learning model to obtain the optimal beam weight parameter combination corresponding to the target building.

In this embodiment, the learning reinforcement model includes a markov model and a neural network model, a state information set, an action information set, and reward and punishment information in the markov model of the learning reinforcement model are defined, the learning reinforcement model is trained in advance to obtain a trained reinforcement learning model, and the state information in the current state information set of the target building is input into the trained reinforcement learning model to obtain an optimal beam weight parameter combination corresponding to the target building.

The reinforcement learning is that an Agent, namely an initiator of an action, affects the environment, the state of the environment changes after receiving the action, and simultaneously a reinforcement signal (reward or punishment) is generated and fed back to the Agent, the Agent selects an action according to the reinforcement signal and the current state of the environment, and the selection principle is to increase the probability of being reinforced. The Deep Q Network (DQN) is an algorithm in reinforcement learning, the DQN integrates Q learning and a Convolutional Neural Network (CNN), Q learning Q (s, a) takes an action a in a state s at a certain moment, the action can obtain an expectation of income, and the environment feeds back a corresponding reward and punishment r according to the action of Agent.

And 104, adjusting the beam weight parameters of the base station corresponding to the target building according to the optimal beam weight parameter combination.

In this embodiment, the beam weight parameters include Dl256QamSwitch, SsbPeriod, occipidedrbnum, dladdationdmrspos, MaxMimoLayerNum, nrdu cellid, and TrsBeamPattern, and the beam weight parameters have respective corresponding values, and different combination modes corresponding to different values adjust the current beam weight parameter combination of the base station corresponding to the target building to the optimal beam weight parameter combination, so as to adjust the beam weight parameter of the base station corresponding to the target building.

In the embodiment, if a beam weight adjustment instruction is monitored, the number of users on each current floor of a target building and the residence time of the users on the corresponding floor are determined, the current state information of the target building is determined according to the number of the users on each current floor, the residence time of the users on the corresponding floor and the base station information, and the state information is input into a pre-trained reinforcement learning model to obtain the optimal beam weight parameter combination corresponding to the target building, so that the beam weight parameters of the base station are adjusted according to the optimal beam weight parameter combination, and the residence time of the users on the floor and the floor in the building are added during the adjustment of the beam weight parameters.

It should be noted that the beam weight parameter adjusting device may also be disposed in the base station, and if the base station monitors a beam weight parameter adjusting instruction, the base station determines the number of users on each floor corresponding to the target building and the residence time of the users on the corresponding floor, determines the current state information of the target building according to the number of users on each floor corresponding to the target building, the residence time of the users on the corresponding floor, and the base station information corresponding to the target building, inputs the current state information of the target building into the trained reinforcement learning model, obtains the optimal beam weight parameter combination corresponding to the target building, and adjusts the beam weight parameter of the base station corresponding to the target building according to the optimal beam weight parameter combination, and specifically, controls the beam transmitting device of the base station to perform beam adjustment based on the optimal beam weight parameter combination.

Example two

Fig. 3 is a schematic flowchart of a method for adjusting a beam weight parameter according to a second embodiment of the present invention, and as shown in fig. 3, on the basis of the method for adjusting a beam weight parameter according to the first embodiment of the present invention, the determining of the number of users on each floor corresponding to the target building and the residence time of the users on the corresponding floor in step 101 are further refined, which includes the following steps:

step 1011, obtaining signaling data XDR corresponding to the target building, where the XDR includes a user terminal identification code, a floor identification code corresponding to the user terminal identification code, a service start time corresponding to the user terminal identification code, and a corresponding service end time.

In this embodiment, signaling data XDR corresponding to a target building is collected, where the XDR data includes a user terminal identifier, a floor identifier corresponding to the user terminal identifier, a service type corresponding to the user terminal identifier, a service start time corresponding to the user terminal identifier, and a corresponding service end time, where the user terminal identifier is an MSISDN (mobile station number), and the floor identifier is a CELL identity (CELL identity) of a CELL.

Step 1012, determining the number of users on each floor corresponding to the floor identification code corresponding to the target building according to the floor identification code, and determining the residence time of the user corresponding to the user terminal identification code on the floor corresponding to the current floor identification code according to the service start time and the corresponding service end time corresponding to the user terminal identification code.

In this embodiment, the floor identifiers of the users are the same, which indicates that the users are on the same floor in the target building, the number of the users corresponding to the same floor identifier is counted, the number of the users on each current floor corresponding to the floor identifier corresponding to the target building is determined based on the floor identifier, the residence time of the users corresponding to the user terminal identifiers on the floor corresponding to the current floor identifier is further calculated according to the service start time and the corresponding end time corresponding to the user terminal identifiers, specifically, the difference between the end time corresponding to the user terminal identifiers and the corresponding service start time is calculated, a time difference is obtained, and the time difference is determined as the residence time of the users on the floor corresponding to the current floor identifier.

EXAMPLE III

Fig. 4 is a schematic flow chart of a method for adjusting beam weight parameters according to a third embodiment of the present invention, and as shown in fig. 4, on the basis of the method for adjusting beam weight parameters according to the second embodiment of the present invention, step 102 is further refined, including the following steps:

step 1021, base station information is obtained, and the base station information comprises corresponding target building information and a corresponding floor identification code.

In this embodiment, base station information is obtained, where the base station information includes a base station identifier, a corresponding floor identifier, a base station longitude, a base station latitude, a base station distance, an antenna hanging height, and an antenna downtilt angle.

And 1022, associating the number of users on each floor corresponding to the current floor identification code corresponding to the target building and the residence time of the users on the floor corresponding to the current floor identification code with the base station information based on the preset key fields, obtaining associated information, and determining the associated information as the current state information of the target building.

In this embodiment, a preset key field is obtained, where the preset key field is a key field shared by the number of users on each floor corresponding to the current floor identification code corresponding to the base station information and the target building and the residence time of the user on the floor corresponding to the current floor identification code, for example, the preset key field is set as a floor identification code, and the number of users on each floor corresponding to the current floor identification code corresponding to the base station information and the target building and the residence time of the user on the floor corresponding to the current floor identification code are associated according to the floor identification code, so as to obtain corresponding associated information, and the associated information is determined as the current state information of the target building.

Example four

Fig. 5 is a schematic flow chart of a method for adjusting beam weight parameters according to a fourth embodiment of the present invention, and as shown in fig. 5, on the basis of the method for adjusting beam weight parameters according to the first embodiment of the present invention, before step 103, the method further includes the following steps:

and step 1031, constructing a state information set, an action information set and reward and punishment information in the Markov model of the learning reinforcement model.

In this embodiment, the learning reinforcement model is composed of a markov model and a neural network model, a state information set, an action information set, and reward and punishment information in the markov model of the learning reinforcement model are defined, where the state information set includes a plurality of state information, the action information set includes a plurality of action information, and reward and punishment information is constructed according to a downlink average rate and a preset user change function. The state information set of the Markov model is built based on the current state information of the target building, the action information set of the Markov model is built based on the beam weight parameter combination, and corresponding reward and punishment information is obtained based on each state information in the state information set and each action information in the action information set.

And 1032, constructing an experience pool, obtaining a plurality of corresponding experience values based on the state information set, the action information set and the reward and punishment information, and adding the plurality of experience values into the constructed experience pool.

In this embodiment, an experience pool is constructed, where the experience pool is used to store experience values, a plurality of corresponding experience values are obtained based on a state information set, an action information set, and reward punishment information, and the plurality of experience values are added to the experience pool until the number of experience values in the experience pool reaches a preset number.

And 1033, randomly selecting a plurality of experience values from the experience pool as training samples, and training the neural network model of the learning reinforcement model by using the training samples to obtain the trained neural network model of the learning reinforcement model.

In this embodiment, a large number of experience values are stored in the experience pool, and because the samples have correlation, continuous samples are selected for learning, the effect is not good, and the correlation between the samples is damaged by randomization, so that a plurality of experience values are randomly selected from the experience pool as training samples of the reinforcement learning model, and the reinforcement learning model is trained by using the training samples, so as to obtain a neural network model corresponding to the trained learning reinforcement model.

EXAMPLE five

Fig. 6 is a schematic flow chart of a method for adjusting beam weight parameters according to the fifth embodiment of the present invention, and as shown in fig. 6, on the basis of the method for adjusting beam weight parameters according to the fourth embodiment of the present invention, step 1031 is further refined, including the following steps:

and step 1031a, acquiring the target building state information, and constructing a state information set in a Markov model of the learning reinforcement model based on the target building state information.

In this embodiment, the current state information of the target building is acquired, the current state information of the target building is constructed as a state information set in the markov model based on the state information of the target building, and the state information of the target building is determined as the state information set in the markov model.

And step 1031b, obtaining a beam weight value parameter combination of the base station corresponding to the target building state information, constructing an action information set in a Markov model of the learning and strengthening model based on the beam weight value parameter combination, and constructing reward and punishment information based on the downlink average speed and a preset user change function.

In this embodiment, a beam weight parameter combination of a corresponding base station is determined based on a base station beam weight parameter corresponding to the target building state information, where the beam weight parameters Dl256QamSwitch, SsbPeriod, occipidrbnum, dladditioldrrrspos, MaxMimoLayerNum, nrdu cellid, and TrsBeamPattern.

The attribute type of the Dl256QamSwitch parameter is nrducellaglycowitch, and the value of the Dl256QamSwitch parameter is turned on and off, which indicates that the fixed downlink 256QAM is turned on or off, and is mainly used for controlling whether a cell high-order modulation function is turned on or not. The attribution type of the SsbPeriod parameter is NRDUCELL, the SsbPeriod parameter can be set from one of MS5, MS10, MS20, MS40, MS80 and MS160, and the SsbPeriod parameter is used for fixedly configuring an SSB period. The attribution type of the Occupied RbNum parameter is NRDUCELLPDCCH, the dereferencing range of the Occupied RbNum parameter is 0-22, and the Occupied RbNum parameter represents the range of frequency domain occupied on each PDCCH symbol of the base station. The attribution type of the DlAdditionalDmrsPos parameter is NRDUCCELLPDSCH, the parameter setting can be selected from NOT CONFIG, POS1 and POS2, and the DlAdditionalDmrsPos parameter is used for configuring the position of the downlink additional DMRS. The attribute type of the MaxMimo LayerNum parameter is NRD ACCELDSCH, the setting of the MaxMimo LayerNum parameter can be selected from one of LAYER2, LAYER4, LAYER6, LAYER8, LAYER10, LAYER12, LAYER14 and LAYER16, and the MaxMimo LayerNum parameter is used for controlling the maximum transmission LAYER number of MIMO space division multiplexing of the base station. The attribution type of the NrDuCellId parameter is MOD NRDUCELL, the NrDuCellId parameter setting can be selected from one of MS20, MS40, MS80 and MS160, and the NrDuCellId parameter is used for representing the transmission period of SIB 1. The attribution type of the TrsBeamPattern parameter is NRDUCELLTRPTAEM, the TrsBeamPattern parameter setting can be selected from one of PATTERN1 and PATTERN2, and the TrsBeamPattern parameter is used for configuring the beam type of TRS.

Wherein Dl256QamSwitch has 2 setting modes, SsbPeriod has 5 setting modes, occipiedrbnum has 23 setting modes, dladdationdmrspos has 3 setting modes, MaxMimoLayerNum has 8 setting modes, nrdu cellid has 4 setting modes, TrsBeamPattern has 4 setting modes, the total of the beam weight parameter combination modes is 2 × 6 × 23 × 3 × 8 × 4 × 2 ═ 48, and the action information set in the markov model of the learning reinforcement model is constructed based on the beam weight parameter combination.

In this embodiment, reward and punishment information of the markov model is constructed based on the downlink average rate and a preset user variation function.

EXAMPLE six

Fig. 7 is a schematic flow chart of a beam weight parameter adjustment method according to a sixth embodiment of the present invention, and as shown in fig. 7, on the basis of the beam weight parameter adjustment method according to the fourth embodiment of the present invention, step 1032 is further refined, including the following steps:

and 1032a, selecting the state information and the action information from the state information set, and executing the action corresponding to the action information to obtain the corresponding next state information and the corresponding reward and punishment information.

In this embodiment, state information is selected from the state information set, information is selected from the action information set, and an action corresponding to the action information is executed to obtain next state information and corresponding reward and punishment information.

And 1032b, taking the selected state information, the selected action information, the corresponding next state information and the corresponding reward punishment information as a group of experience values, adding the experience values into the constructed experience pool, and executing the steps of selecting the state information from the state information set and selecting the action information from the action information set until the number of the experience values in the experience pool reaches a preset number.

In this embodiment, a group of selected state information, selected action information, corresponding next state information, and corresponding reward punishment information are used as a group of experience values, an experience pool is stored in the experience pool, state information is selected from the state information set again, information is selected from the action information set, an action corresponding to the action information is executed, next state information and corresponding reward punishment information are obtained, a group of experience values are obtained again, until the number of experience values in the experience pool reaches a preset number, and if the number of experience values in the experience pool reaches the preset number, the reinforcement learning model is further trained, wherein the preset number is set according to an actual situation.

EXAMPLE seven

On the basis of the method for adjusting the beam weight parameter provided in the fourth embodiment of the present invention, step 1033 is further refined, which includes the following steps:

step 1033a, constructing a pre-estimated network model and a target network model corresponding to the neural network model, and defining a corresponding loss function, where the pre-estimated network model is expressed as: q (s, a; θ), the target network model is represented as: ytarget is r + γ max Q (s, a; θ), where s is state information, a is action information, θ is a weight parameter, r is reward and punishment information, and the loss function is expressed as: loss ═ Q-y_target)²。

In this embodiment, two neural network models, namely, a prediction network model and a target network model, are established, where the prediction network model is used to evaluate a cost function corresponding to a current state and an action, and the prediction network model is expressed as:

q (s, a; theta) formula (1)

Where s is status information, a is motion information, and θ is a weight parameter.

In this embodiment, the target network model is expressed as:

Y_targer + γ max Q (s, a; θ) equation (2)

Wherein s is state information, a is action information, theta is a weight parameter, and r is reward and punishment information.

In this embodiment, a loss function is defined, and the loss function is expressed as:

loss＝(Q-Y_target)²formula (3)

And 1033b, inputting the training sample into the prediction network model, and updating the weight parameters of the prediction network model by utilizing the gradient back propagation of the neural network through the optimal loss function to obtain the optimal weight parameters so as to obtain the neural network model of the trained learning reinforcement model.

In this embodiment, the training samples are input into the prediction network model, and the weight parameters of the prediction network model are updated by using the gradient back propagation of the neural network in the manner of the optimal loss function, so as to obtain the optimal weight parameters.

Example eight

On the basis of the method for adjusting the beam weight parameter provided in the first embodiment of the present invention, after step 104, the method further includes the following steps:

and 105, acquiring a first downlink average rate corresponding to the preset time before adjustment and a second downlink average rate corresponding to the preset time after adjustment, and comparing the first downlink average rate with the second downlink average rate.

In this embodiment, a first downlink average rate corresponding to the beam weight parameter combination before adjustment and a second downlink average rate corresponding to the optimal beam weight parameter combination after adjustment are obtained. And comparing the first downlink average rate corresponding to the beam weight parameter combination before adjustment with the second downlink average rate corresponding to the optimal beam weight parameter combination after adjustment, and sending prompt information to the terminal according to the comparison result.

And 106, if the first downlink average rate is less than or equal to the second downlink average rate, sending a prompt message of successful adjustment to the corresponding terminal.

In this embodiment, if the first downlink average rate corresponding to the beam weight parameter combination before adjustment is less than or equal to the second downlink average rate, it is described that the downlink average rate is increased by adjusting the beam weight parameter of the base station, and a prompt message indicating that the adjustment is successful is sent to the terminal of the operation and maintenance staff, so that the operation and maintenance staff can grasp the latest dynamics of the base station in time.

And step 107, if the first downlink average rate is greater than the second downlink average rate, sending a prompt message of failed adjustment to the corresponding terminal.

In this embodiment, if the first downlink average rate corresponding to the beam weight parameter combination before adjustment is greater than the second downlink average rate, it is described that even if the beam weight parameter of the base station is adjusted, the downlink average rate is not improved, which may be caused by other reasons, and therefore, it is necessary to notify the operation and maintenance staff to perform maintenance, and send a prompt message indicating that the adjustment has failed to the terminal of the operation and maintenance staff, so that the operation and maintenance staff can timely grasp the latest dynamics of the base station.

Fig. 8 is a schematic structural diagram of a beam weight parameter adjusting apparatus according to an embodiment of the present invention, and as shown in fig. 8, the beam weight parameter adjusting apparatus 200 according to the embodiment includes a determining unit 201, an input unit 202, and an adjusting unit 203.

The determining unit 201 is configured to determine, if a beam weight parameter adjustment instruction is monitored, the number of users on each current floor corresponding to the target building and the residence time of the users on the corresponding floor, where the beam is a beam transmitted by the base station to the user terminal of the target building. The determining unit 201 is further configured to determine current state information of the target building according to the number of users on each current floor corresponding to the target building, the residence time of the users on the corresponding floor, and the base station information corresponding to the target building. The input unit 202 is configured to input the current state information of the target building into the trained reinforcement learning model, so as to obtain an optimal beam weight parameter combination corresponding to the target building. An adjusting unit 203, configured to adjust the beam weight parameter of the base station corresponding to the target building according to the optimal beam weight parameter combination.

Optionally, the determining unit is further configured to obtain signaling data XDR corresponding to the target building, where the XDR includes a user terminal identifier, a floor identifier corresponding to the user terminal identifier, a service start time corresponding to the user terminal identifier, and a corresponding service end time; and determining the number of users on each floor corresponding to the floor identification code corresponding to the target building according to the floor identification code, and determining the residence time of the user corresponding to the user terminal identification code on the floor corresponding to the current floor identification code according to the service start time corresponding to the user terminal identification code and the corresponding service end time.

Optionally, the determining unit is further configured to acquire base station information, where the base station information includes corresponding target building information and a corresponding floor identification code; and associating the number of users on each floor corresponding to the current floor identification code corresponding to the target building and the residence time of the users on the floor corresponding to the current floor identification code with the base station information based on the preset key field to obtain associated information, and determining the associated information as the current state information of the target building.

Optionally, the apparatus for adjusting beam weight parameters further includes a constructing unit.

The building unit is used for building a state information set, an action information set and reward and punishment information in a Markov model of a learning reinforcement model; establishing an experience pool, obtaining a plurality of corresponding experience values based on the state information set, the action information set and the reward and punishment information, and adding the plurality of experience values into the established experience pool; and randomly selecting a plurality of experience values from the experience pool as training samples, and training the neural network model of the learning reinforcement model by adopting the training samples to obtain the trained neural network model of the learning reinforcement model.

Optionally, the building unit is further configured to obtain target building state information, and build a state information set in a markov model of the learning reinforcement model based on the target building state information; the method comprises the steps of obtaining a wave beam weight value parameter combination of a base station corresponding to target building state information, constructing an action information set in a Markov model of a learning reinforcement model based on the wave beam weight value parameter combination, and constructing reward and punishment information based on a downlink average speed and a preset user change function.

Optionally, the building unit is further configured to select state information and action information from the state information set, and execute an action corresponding to the action information to obtain corresponding next state information and corresponding reward and punishment information; and taking the selected state information, the selected action information, the corresponding next state information and the corresponding reward and punishment information as a group of experience values, adding the experience values into the constructed experience pool, and executing the steps of selecting the state information from the state information set and selecting the action information from the action information set until the number of the experience values in the experience pool reaches a preset number.

Optionally, the constructing unit is further configured to construct an estimated network model and a target network model corresponding to the neural network model, and define a corresponding loss function, where the estimated network model is expressed as: q (s, a; θ), targetThe network model is represented as: ytarget is r + γ max Q (s, a; θ), where s is state information, a is action information, θ is a weight parameter, r is reward and punishment information, and the loss function is expressed as: loss ═ Q-Y_target)²(ii) a And inputting the training samples into the prediction network model, and updating the weight parameters of the prediction network model by utilizing the gradient back propagation of the neural network through the optimal loss function so as to obtain the optimal weight parameters and obtain the neural network model of the trained learning reinforcement model.

Optionally, the apparatus for adjusting beam weight parameters further includes a sending unit.

A sending unit, configured to obtain a first downlink average rate corresponding to the preset time before adjustment and a second downlink average rate corresponding to the preset time after adjustment, and compare the first downlink average rate with the second downlink average rate; if the first downlink average rate is less than or equal to the second downlink average rate, sending a prompt message of successful adjustment to the corresponding terminal; and if the first downlink average rate is greater than the second downlink average rate, sending prompt information of failed adjustment to the corresponding terminal.

Fig. 9 is a block diagram of an electronic device for implementing a method for adjusting beam weight parameters according to an embodiment of the present invention, and as shown in fig. 9, the electronic device 300 includes: memory 301, processor 302.

The memory 301 stores computer-executable instructions;

the processor 302 executes computer-executable instructions stored by the memory 301 to cause the processor to perform a method provided by any of the embodiments described above.

In an exemplary embodiment, a computer-readable storage medium is also provided, in which computer-executable instructions are stored, the computer-executable instructions being executed by a processor to perform the method in any one of the above-mentioned embodiments.

In an exemplary embodiment, a computer program product is also provided, comprising a computer program for execution by a processor of the method in any of the above embodiments.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

20页详细技术资料下载

Beam weight parameter adjusting method, device, equipment and storage medium

相关技术

网友询问留言