Optimal collaborative operation method of multi-energy system based on intelligent agent

文档序号：191158 发布日期：2021-11-02 浏览：39次中文

阅读说明：本技术 一种基于智能体的多能源系统最优协同运行方法 (Optimal collaborative operation method of multi-energy system based on intelligent agent ) 是由向月徐博涵刘友波刘俊勇王天昊项添春金尧吴彬马世乾于 2021-07-16 设计创作，主要内容包括：本发明公开了一种基于智能体的多能源系统最优协同运行方法,涉及多能源系统技术领域,通过π网络和Q网络生成目标网络,设定循环周期T；将一组历史数据输入至目标网络,通过π网络决定动作,通过多能源系统物理模型计算出下一时刻的观测状态和回报,通过r值更新π网络和Q网络的参数,循环T次后完成DDPG算法的离线学习；通过观测设备获取DDPG算法的观测数据,将观测数据输入至完成离线学习的DDPG算法获得决策动作,完成多能源系统的实时自趋优运行。本发明克服了传统数学算法需要对物理模型间的耦合关系进行复杂建模的问题,并扩展了一般机器学习算法的动作空间,使决策可以更加逼近最优决策。(The invention discloses an optimal collaborative operation method of a multi-energy system based on an agent, which relates to the technical field of multi-energy systems, and comprises the steps of generating a target network through a pi network and a Q network, and setting a cycle period T; inputting a group of historical data into a target network, determining actions through a pi network, calculating an observation state and a return at the next moment through a multi-energy system physical model, updating parameters of the pi network and a Q network through a r value, and completing off-line learning of a DDPG algorithm after circulating for T times; and acquiring observation data of the DDPG algorithm through the observation equipment, inputting the observation data into the DDPG algorithm completing off-line learning to obtain decision-making action, and completing real-time self-optimization-approaching operation of the multi-energy system. The invention overcomes the problem that the traditional mathematical algorithm needs to carry out complex modeling on the coupling relation between the physical models, and expands the action space of the common machine learning algorithm, so that the decision can be closer to the optimal decision.)

1. An optimal collaborative operation method of a multi-energy system based on an agent is characterized by comprising the following steps:

s1, generating a target network through a pi network and a Q network, setting a cycle period T, inputting a group of historical data into the target network, determining actions through the pi network, inputting the actions into the return of a physical model calculation strategy of the multi-energy system and the observation state of the historical data at the next moment, correcting parameters of the Q network according to the return to obtain a corrected Q network, obtaining a Q value through the corrected Q network, correcting parameters of the pi network through the Q value to obtain a corrected pi network, generating a new target network through the corrected Q network and the corrected pi network, and completing off-line learning of the DDPG algorithm after T cycles;

and S2, acquiring observation data of the DDPG algorithm in real time through the observation equipment, inputting the observation data into the DDPG algorithm completing off-line learning to obtain decision-making action, and completing real-time self-optimization-approaching operation of the multi-energy system through the decision-making action.

2. The method for optimal collaborative operation of an intelligent agent-based multi-energy system according to claim 1, wherein the multi-energy system physical model in step S1 includes a cogeneration unit model, a photovoltaic power source, an electricity storage model, a gas boiler model, an electric boiler model, and a user-side thermal compensation model;

a cogeneration unit model:

h_CHP,t＝δ·p_CHP，t

p_CHP,tthe cogeneration unit generates electric output h at the moment t_CHP,tThe heat output of the cogeneration unit at the moment t is g_CHP,tThe gas consumption of the cogeneration unit at the moment t, delta is the thermoelectric ratio of the cogeneration unit, and alpha_CHPIs a conversion factor of the cogeneration unit,for the minimum electrical power of the cogeneration unit,the maximum electric power of the cogeneration unit;

electric boiler model:

h_EB,t＝p_EB，t·α_EB

p_EB，telectric power of the electric boiler at time t, h_EB，tTo the thermal power of the electric boiler at time t, alpha_EBIs a conversion factor of the electric boiler,is the minimum thermal power of the electric boiler,is maximum heat of the electric boilerPower;

the gas boiler model is as follows:

h_GB，tis the thermal power of the gas boiler at time t, g_GB，tIs the gas consumption of the gas boiler at time t, alpha_GBIs a conversion factor of a gas boiler,is the minimum thermal power of the gas boiler,the maximum thermal power of the gas boiler;

the power storage model comprises:

C_soc，0＝C_ini＝C_soc，23

p_BES，tto storeElectric power of electric devices at time t, C_soc，tIs the state of charge, ρ, of the storage device at time t_BESFor the efficiency of the electricity storage device, Q_BESIs the capacity of the electricity storage means, p_chFor the charging efficiency of the electric storage device, ρ_disIn order to achieve the discharge efficiency of the electric storage device,is the minimum state of charge of the electrical storage device,is the maximum state of charge of the electrical storage device,for the minimum electric power of the electric storage means,is the maximum electric power of the electric storage means, C_iniTo an initial state of charge of the storage means, C_soc，0Is the state of charge at 0 of the electric storage device, C_soc,23Is the state of charge of the electrical storage device at 23;

user-side thermal compensation model:

d_h,t＝h_load,t-(h_CHP,t+h_EB,t+h_GB,t)

0≤d_h,t≤0.2·h_load,t

h_load,tis the thermal load at time t, d_h,tIs the thermal power shortage at time t, mu_h,tCompensating prices for thermal power deficit, theta_wil，θ_uwilPrice compensation for different gradients;

a return function:

r_t(s_t,a_t)＝-(C_p(s_t,a_t)+C_BES(s_t,a_t)+C_u(s_t,a_t))/1000(24)

the return function is used for measuring the quality of the decision and is used as the basis for correcting the neural network parameters, r_tIn return for time t, s_tIs the observed state at time t, a_tIs the action at time t;

c in upper-layer power grid interaction model for calculating return function_p(s_t,a_t)，C_BES(s_t，a_t) And C_u(s_t，a_t)：

p_grid,t＝p_load，t+p_EB，t+p_PV，t-p_BES,t-p_CHP,t

p_load,tFor the electrical load power at time t, p_PV,tFor photovoltaic power output at time t, p_grid,tFor the interactive power of the multi-energy system and the upper-layer power grid at the time t,in order to be the minimum of the interaction power,is the maximum interaction power;

an objective function:

F＝min(C_p+C_BES+C_u)

C_u＝μ_h,td_h,t

the scheduling aims at the daily operation cost of the multi-energy system to reach a set minimum value, C_pCost for purchasing energy, C_BESFor depreciation cost of the storage battery, C_uTo compensate for cost for thermal power, mu_BESThe unit price of the power storage equipment is reduced.

3. The method for optimal cooperative operation of a multi-energy system based on intelligent agents as claimed in claim 1, wherein the off-line learning process of the DDPG algorithm in step S1 is as follows:

observation space:

S＝{p_load,h_load,p_PV,C_soc,μ_e}

s is the set of states that the agent needs to observe, p_loadFor loading electric power, h_loadTo load thermal power, p_PVFor photovoltaic power output, C_socTo the state of charge, mu, of the storage means_eThe time-of-use electricity price is obtained;

an action space:

A＝{p_CHP,h_EB,h_GB,p_BES}

a is the set of actions that an agent can make a decision, p_CHPFor the combined production of heat and electricity, h_EBFor the thermal power of the electric boiler, h_GBFor gas boiler thermal power, p_BESCharging and discharging power for the electricity storage device;

q function:

the Q value is the sum of the returns of a plurality of time steps and is used for measuring the quality of the strategy and serving as a basis for correcting the parameters of the neural network, pi is a strategy for fitting the neural network, and gamma is a discount factor;

the pi network is:

a_t＝π(s_t∣θ^π)+v_t

v_t+1＝(1-τ_v)v_t+1

the pi-network is used to fit the mapping of observed states to decision actions, v_tIs the noise at time t, τ_vUpdate coefficient, epsilon, for noise_πIs an update coefficient of a pi network parameter, theta^QBeing a parameter of the Q network, θ^πIs a parameter of the pi-network,is a partial derivative symbol;

the Q network is:

L(θ^Q)＝(y_t-Q(s_t，a_t∣θ^Q))²

y_t＝r_t+γ(Q′s_t+1,π′(s_t+1∣θ^π′)∣θQ′)

θ^π′←τ_θθ^π+(1-τ_θ)θ^π′

θ^Q′←τ_θθ^Q+(1-τ_θ)θ^Q′

the Q network is used to fit the decision action to the mapping of Q values, with pi' being the purpose of the pi networkA target network, Q', being the target network of the Q network, for stabilizing the iterative process, epsilon_πFor updating the parameters of the Q network, tau_θIs the update factor of the target network.

Technical Field

The invention relates to the technical field of multi-energy systems, in particular to an optimal collaborative operation method of a multi-energy system based on an agent.

Background

The multi-energy system integrates multiple energy forms such as heat collection, electricity, gas and the like, and can realize mutual transformation and complementary utilization of the multiple energy forms. However, due to uncertainty in load and renewable energy output, and complex energy coupling relationships, economic operation of multi-energy systems faces significant challenges.

The existing optimal scheduling of the multi-energy system comprises day-ahead scheduling and real-time scheduling, wherein the day-ahead scheduling cannot dynamically respond to new energy output and load fluctuation, and the optimal scheduling effect is difficult to obtain. For real-time scheduling, a scheduling method based on model predictive control is generally adopted, and although the method realizes dynamic scheduling of a multi-energy system, the method still depends on accurate prediction of renewable energy sources and loads and is influenced by prediction deviation. With the rapid development of computer performance, many scholars begin to use machine learning methods to deal with scheduling problems, such as Q-learning, DQN, etc. The Q-learning algorithm requires a large amount of memory to store the Q value, and when the scheduling problem of a high-dimensional space is handled, a problem of dimension disaster occurs. The DQN algorithm solves the Q value storage problem by adding a neural network, eliminates dimension disaster, realizes the expansion to a high-dimensional space, but only outputs discrete actions, loses a plurality of action spaces, causes the rise of the scheduling cost, and cannot obtain optimal scheduling.

Disclosure of Invention

In view of the technical defects, the invention provides an optimal collaborative operation method of a multi-energy system based on an agent.

In order to achieve the purpose, the technical scheme of the invention is as follows:

an optimal collaborative operation method of a multi-energy system based on an agent comprises the following steps:

Preferably, the physical models of the multi-energy system in step S1 include a cogeneration unit model, a photovoltaic power supply, an electricity storage model, a gas boiler model, an electric boiler model, and a user-side thermal compensation model;

a cogeneration unit model:

h_CHP，t＝δ·p_CHP，t

p_CHP，tthe cogeneration unit generates electric output h at the moment t_CHP，tThe heat output of the cogeneration unit at the moment t is g_CHP，tThe gas consumption of the cogeneration unit at the moment t, delta is the thermoelectric ratio of the cogeneration unit, and alpha_CHPIs a conversion factor of the cogeneration unit,for the minimum electrical power of the cogeneration unit,the maximum electric power of the cogeneration unit;

electric boiler model:

h_EB，t＝p_EB，t·α_EB

the gas boiler model is as follows:

the power storage model comprises:

C_soc，0＝C_ini＝C_soc，23

p_BES，tfor the electric power of the accumulator at time t, C_soc，tIs the state of charge, ρ, of the storage device at time t_BESFor the efficiency of the electricity storage device, Q_BESIs the capacity of the electricity storage means, p_chFor the charging efficiency of the electric storage device, ρ_disIn order to achieve the discharge efficiency of the electric storage device,is the minimum state of charge of the electrical storage device,is the maximum state of charge of the electrical storage device,for the minimum electric power of the electric storage means,is the maximum electric power of the electric storage means, C_iniTo an initial state of charge of the storage means, C_soc，0Is the state of charge at 0 of the electric storage device, C_soc，23Is the state of charge of the electrical storage device at 23;

user-side thermal compensation model:

d_h，t＝h_load，t-(h_CHP，t+h_EB，t+h_GB，t)

0≤d_h，t≤0.2·h_load，t

h_load，tis the thermal load at time t, d_h，tIs the thermal power shortage at time t, mu_h，tCompensating prices for thermal power deficit, theta_wil，θ_uwilPrice compensation for different gradients;

a return function:

r_t(s_t，a_t)＝-(C_p(s_t，a_t)+C_BES(s_t，a_t)+C_u(s_t，a_t))/1000(24)

c in upper-layer power grid interaction model for calculating return function_p(s_t，a_t)，C_BES(s_t，a_t) And C_u(s_t，a_t)：

p_grid，t＝p_load，t+p_EB，t+p_PV，t-p_BES，t-p_CHP，t

p_load，tFor the electrical load power at time t, p_PV，tFor photovoltaic power output at time t, p_grid，tFor the interactive power of the multi-energy system and the upper-layer power grid at the time t,in order to be the minimum of the interaction power,is the maximum interaction power;

an objective function:

F＝min(C_p+C_BES+C_u)

C_u＝μ_h，td_h，t

the scheduling aims at achieving daily operation cost of the multi-energy systemTo a set minimum value, C_pCost for purchasing energy, C_BESFor depreciation cost of the storage battery, C_uTo compensate for cost for thermal power, mu_BESThe unit price of the power storage equipment is reduced.

Preferably, the offline learning process of the DDPG algorithm in step S1 is as follows:

observation space:

S＝{p_load，h_load，p_PV，C_soc，μ_e}

an action space:

A＝{p_CHP，h_EB，h_GB，p_BES}

q function:

the pi network is:

a_t＝π(s_t|θ^π)+v_t

v_t+1＝(1-τ_v)v_t+1

the Q network is:

L(θ^Q)＝(y_t-Q(s_t，a_t|θ^Q))²

y_t＝r_t+γ(Q′s_t+1，π′(s_t+1|θ^π′)|θ^Q′)

θ^π′←τ_θθ^π+(1-τ_θ)θ^π′

θ^Q′←τ_θθ^Q+(1-τ_θ)θ^Q′

q network is used for fitting the mapping from decision action to Q value, pi 'is the target network of pi network, Q' is the target network of Q network, used for stabilizing the iterative process, epsilon_πFor updating the parameters of the Q network, tau_θIs the update factor of the target network.

The invention has the beneficial effects that:

(1) the optimal collaborative operation method of the multi-energy system based on the intelligent agent is provided, and the problems that the traditional day-ahead scheduling cannot make a real-time decision and the traditional day-in scheduling depends on accurate load prediction are solved;

(2) historical data and physical models are combined, the DDPG algorithm is used for enabling an intelligent agent to automatically mine the relation between the current state and the optimal decision, the problem that the traditional mathematical algorithm needs to carry out complex modeling on the coupling relation between the physical models is solved, the action space of a common machine learning algorithm is expanded, and the decision can be closer to the optimal decision.

Drawings

Fig. 1 is provided by the present invention: a multi-energy system structure diagram;

fig. 2 is provided by the present invention: an agent decision logic diagram;

fig. 3 is provided by the present invention: the DDPG algorithm learns the flow chart off line.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the accompanying drawings, and other advantages and effects of the present invention will be readily apparent to those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Examples

The multi-energy system integrates multiple energy forms such as heat collection, electricity, gas and the like, and realizes mutual transformation and complementary utilization of the multiple energy forms. However, due to the uncertainty of renewable energy yield and load, and the complex energy coupling relationship, real-time economic operation of multi-energy systems faces significant challenges. According to the method, the data collected by real-time monitoring equipment (such as a voltmeter, an ammeter and the like) is utilized, so that the computer can automatically schedule production equipment according to load fluctuation and new energy output conditions in real time under the condition of no human intervention, and the minimum long-term running cost of the multi-energy system is realized. The physical modeling of this patent possesses the commonality, and applicable new energy district or new energy industry garden etc. that have different equipment specifications.

As shown in fig. 1 and 2, an optimal cooperative operation method of a multi-energy system based on an agent includes: the method comprises a physical model modeling method of the multi-energy system, a data-driven DDPG offline learning method and an online operation method of the multi-energy system.

(1) The physical model modeling method of the multi-energy system comprises the following steps:

the multi-energy system physical model adopts a mode that the multi-energy system runs in parallel with the main network, and comprises a cogeneration unit model, a photovoltaic power supply, an electricity storage model, a gas boiler model, an electric boiler model and a user side thermal compensation model. The dispatching goal of the physical model of the multi-energy system is that the daily operation cost of the multi-energy system is minimum.

A cogeneration unit model:

h_CHP，t＝δ·p_CHP，t (1)

p_CHP，tthe cogeneration unit generates electric output h at the moment t_CHP，tThe heat output of the cogeneration unit at the moment t is g_CHP，tThe gas consumption of the cogeneration unit at the moment t, delta is the thermoelectric ratio of the cogeneration unit, and alpha_CHPIs a conversion factor of the cogeneration unit,for the minimum electrical power of the cogeneration unit,is the most important of the combined heat and power generation unitA large electric power;

electric boiler model:

h_EB，t＝P_EB，t·α_EB (4)

the gas boiler model is as follows:

the power storage model comprises:

C_soc，0＝C_ini＝C_soc，23 (12)

user-side thermal compensation model:

d_h，t＝h_load，t-(h_CHP，t+h_EB._t+h_GB，t)(13)

0≤d_h，t≤0.2·h_load，t(15)

h_load，tis the thermal load at time t, d_h，tIs the thermal power shortage at time t, mu_h，tCompensating prices for thermal power deficit, theta_wil，θ_uwilThe price is compensated for different gradients.

The upper layer power grid interaction model is used for calculating C in (24)_p(s_t，a_t)，C_BES(s_t，a_t) And C_u(s_t，a_t)：

p_grid，t＝p_load，t+p_EB，t+p_PV，t-p_BES，t-p_CHP，t(16)

p_load，t，p_PV，t，p_grid，tRespectively the electric load power at the time t, the output of the photovoltaic power supply and the interactive power of the multi-energy system and the upper-layer power grid,respectively, a minimum interactive power and a maximum interactive power.

An objective function:

F＝min(C_p+C_BES+C_u)(18)

C_u＝μ_h，td_h，t(21)

the scheduling aims at minimizing the daily operation cost of the multi-energy system, C_p，C_BES，C_uCost of purchasing energy, cost of depreciation of electric storage equipment, and cost of thermal power compensation, mu, respectively_BESIn order to reduce the unit price of the power storage equipment, the purpose that the target function is a designed return function (24) is achieved, the return function is set based on the target function, an intelligent agent learned according to the return function can complete the target function, and the daily running cost is the minimum.

The data-driven DDPG off-line learning method comprises the following steps:

the DDPG algorithm fits the mapping from the observation state to the optimal action through a neural network containing a large number of parameters, corrects the parameters of the neural network according to the calculated income of the physical model, and completes strategy learning under data driving through multiple iterations.

Observation space:

S＝{p_load，h_load，P_PV，C_soc，μ_e}(22)

s is a set of states which need to be observed by the agent;

an action space:

A＝{p_CHP，h_EB，h_GB，p_BES} (23)

a is a set of actions that an agent can make a decision;

a return function:

r_t(s_t，a_t)＝-(C_p(s_t，a_t)+C_BES(s_t，a_t)+C_u(s_t，a_t))/1000(24)

q function:

the Q value is the sum of returns under a plurality of time steps and is used for measuring the quality of the strategy and serving as a basis for correcting parameters of the neural network, pi is a strategy for fitting the neural network, gamma is a discount factor, the circulation is one time step once, the circulation is set for T times, and T time steps are total;

the pi network is:

a_t＝π(s_t|θ^π)+v_t(26)

v_t+1＝(1-τ_v)v_t+1(27)

the Q network is:

L(θ^Q)＝(y_t-Q(s_t，a_t|θ^Q))²(30)

y_t＝r_t+γ(Q′s_t+1，π′(s_t+1|θ^π′)|θ^Q′)(31)

θ^π′←τ_θθ^π+(1-τ_θ)θ^π′ (34)

θ^Q′←τ_θθ^Q+(1-τ_θ)θ^Q′ (35)

DDPG off-line learning process: firstly, random parameters are used for generating a pi network and a Q network, the same parameters are used for generating a target network (initially, the target network is a copy of an original network, and the updating speed of the parameters of the target network is slower compared with the pi network and the Q network in the learning process, so that the learning process can be stabilized; formulas (34) and (35) show the updating mode of the target network), then a cycle period T is set, a first group of historical data is input (in the learning process, a group of different historical data is put in each new time step to serve as the observed value of an intelligent body, and the historical data is from the past real data; after the learning is finished, when the intelligent body is used in an actual system, the observed value of the intelligent body is the actual data obtained according to monitoring equipment, and the historical data is the structured data accumulated in the long-term operation process of a comprehensive energy system, such as year's load data, etc.), act on the basis of the pi-network (the nature of the pi-network is a function with a large number of random parameters that represent the policies of the agent. An action is a decision made by the agent based on policy and observations, a ═ pi (o). That is, the four variables included in (23), the profit of the strategy and the observation state of the historical data at the next moment (one hour at each moment) are calculated according to the physical model of the multi-energy system, the observation state in the learning process is from the historical data, the observation state in the online operation is from the real-time observation data, and are recorded, the Q network parameter is corrected according to the return, and the Q value is obtained according to the Q network to correct the pi network parameter. And finally, circulating for T times to finish off-line learning of the DDPG algorithm.

(3) The online operation method of the multi-energy system comprises the following steps: observation data of the DDPG algorithm are acquired in real time through observation equipment such as a voltmeter, a current meter and a temperature measuring instrument which can upload data in real time, and then the observation data are input into the DDPG algorithm which completes learning, so that decision-making action can be obtained, and real-time self-optimization-approaching operation of the multi-energy system is realized.

As shown in fig. 3, an optimal collaborative operation method for a multi-energy system based on an agent includes two parts, namely a learning process and an online operation:

the learning process comprises the following steps:

and 1, setting a cycle upper limit Episode of learning days as M, then randomly generating a pi network and a Q network containing a large number of parameters, and copying the pi network and the Q network as a target network.

2, start learning of the new day

And 3, setting the time step number T in one day as the upper cycle limit of one day.

And 4, starting learning of a new time step, inputting historical data (load electric power, load thermal power, photovoltaic power output power, time-of-use electricity price) of the time step in one day and the charge state (0.4 is taken at the first time step) of the electricity storage device as observed values into the pi network, and calculating (26) an action value according to the pi network. The state of charge (8) and the r value (24) of the energy storage device in the next time step are then calculated from the motion values and the physical model. Then, parameters (30-35) of the Q network and the target network thereof are corrected through the r value, and the Q value is calculated according to the corrected Q network to correct the pi network and the target network thereof (28-29). Finally, if the time step is not equal to T, returning to 4; if the time step is equal to T and Episode is not equal to M, then 2 is returned, if the time step is equal to T and Episode is equal to M, then learning is completed, and the loop is ended.

The online operation part comprises the following steps:

(1) periodically acquiring real-time observation data in units of (24 hours/T).

(2) And transmitting the acquired data to the intelligent agent, and enabling the intelligent agent to automatically set a scheduling scheme of the next time period, and returning to the step (1).

The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

14页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于曲线关联分割机制的站台门异常检测方法及装置

Optimal collaborative operation method of multi-energy system based on intelligent agent

相关技术

网友询问留言