UUV agent behavior learning and evolution model based on chaos immune genetic mechanism

文档序号:1659014 发布日期:2019-12-27 浏览:23次 中文

阅读说明:本技术 一种基于混沌免疫遗传机制的uuv智能体行为学习与演化模型 (UUV agent behavior learning and evolution model based on chaos immune genetic mechanism ) 是由 梁洪涛 高洁 于 2018-06-20 设计创作,主要内容包括:本发明属于水下无人系统建模与仿真技术领域,具体涉及一种基于混沌免疫遗传机制的UUV智能体行为学习与演化模型。首先,载入待求解问题及约束条件作为抗原Ag,根据疫苗群、记忆种群和混沌机制产生初始化抗体种群;其次,根据抗体适应度计算结果,利用疫苗接种机制实现学习过程收敛方向的控制,并完成抗体记忆库的更新;最后,依次设计基于轮盘赌的选择算子、基于自适应调整的交叉算子和基于高斯与多项式混合的变异算子实现抗体种群的多样性,并进行早熟抑制,进而实现抗体种群的更新与迭代。该模型结合基本遗传算法全局搜索能力和免疫与混沌机制局部搜索能力的优势,通过不断调整和优化问题解的搜索空间,从而促进行为规则的快速学习与演化。(The invention belongs to the technical field of underwater unmanned system modeling and simulation, and particularly relates to a UUV intelligent agent behavior learning and evolution model based on a chaos immune genetic mechanism. Firstly, loading a problem to be solved and a constraint condition as an antigen Ag, and generating an initialized antibody population according to a vaccine population, a memory population and a chaos mechanism; secondly, controlling the convergence direction of the learning process by using a vaccination mechanism according to the calculation result of the antibody fitness, and finishing the updating of an antibody memory bank; and finally, designing a selection operator based on roulette, a crossover operator based on self-adaptive adjustment and a mutation operator based on Gaussian and polynomial mixing in sequence to realize diversity of the antibody population, and performing precocity inhibition to further realize updating and iteration of the antibody population. The model combines the advantages of the global search capability of the basic genetic algorithm and the local search capability of the immunity and chaos mechanism, and promotes the rapid learning and evolution of the behavior rules by continuously adjusting and optimizing the search space of the problem solution.)

1. A UUV agent behavior learning and evolution model based on a chaos immune genetic mechanism is characterized in that:

the method comprises the following steps:

step 1: antigen recognition: loading a problem to be solved and a constraint condition as an antigen Ag;

step 2: extracting the vaccine: uses expert and prior knowledge and characteristics according to the antigen Ag as vaccine information hjThe binary coding mode is adopted to form a genome, and the construction scale is N1Vaccine group AH:

wherein: t is a matrix transposition symbol;

and step 3: initializing antibody population: when k is 0, the memory population AM has a size of N2Where k is the number of iterations and the initial population of antibodies at population size N is AB ═ AB1,Ab2,…,Abi,…,AbN]T,i=1,2,…,N,N2N, the initialization antibody population AB consists of a vaccine population AH and a random population AR according to the specific situation of the memory population AM, or consists of the vaccine population AH, the memory population AM and the random population AR, or consists of the memory population AM and the random population AR; wherein, the random population AR is generated by a chaos mechanism; the specific chaos mechanism is realized by setting an initial value Ab0,Ab0Starting an initial value of a Logistic mapping model, generating a random population AR of an N/2 scale by adopting a one-dimensional Logistic mapping model, wherein a specific chaotic model and the random population AR are respectively as follows:

rl=4×rl-1×(1-rl-1),l=1,2,…,N/2 (2)

AR=[r1,r2,…,rl,…,rN/2]T,l=1,2,…,N/2 (3)

wherein r islRepresents the initial value Ab0Iterating intermediate variables until N/2 random populations AR are formed;

and 4, step 4: and (3) fitness calculation: when k is>1, calculating each antibody Ab in the initialized antibody population ABiFitness Fitnessk(Abi) I 1,2, …, N, selecting maxfiltness with highest fitnessk(Abi) The antibody of (2) is introduced into an antibody memory bank, and the memory bank is updated to N2+1;

And 5: and (3) vaccination: antibodies with the size of H (H < N) are adaptively selected in the initialization antibody population AB for vaccination, and the vaccination size is adaptively changed along with the iteration number k as follows:

wherein alpha and beta are adjustable parameters, and e represents an index;

step 6: and (5) finishing judgment: setting an end condition as whether the maximum iteration number M is met, if k > is equal to M, stopping, and outputting a result optimal behavior Rule; if k satisfies k < M, go to step 7;

and 7: antibody selection: calculating the affinity among the antibodies and the concentration of the antibodies, selecting the antibodies by adopting a roulette mode according to the expected reproduction rate of the antibodies, wherein the probability of each antibody being selected is in direct proportion to the expected reproduction probability of the antibody, and forming a new population A _ N with the population scale of N;

the affinity between antibodies reflects the similarity between antibodies, and antibody concentration is proposed as a measure of similarity between antibodies, which essentially calculates the proportion of similar antibodies in the antibody population:

wherein the Concentrationk(Abi) Denotes a k-th generation antibody AbiConcentration of (D), Si,jDenotes antibody AbiAnd antibody AbjI (j) represents Ab except antibodyiNumber of other antibodies, Ki,jDenotes antibody AbiAnd antibody AbjEncodes the same number of bits, L represents the antibody encoding length, τ represents the similarity threshold,if K isi,jif/L is greater than or equal to τ, then Si,jIf K is equal to 1i,jif/L is less than τ, then Si,j=0;

Based on antibody concentration, antibody AbiThe desired reproduction rate of (c) was calculated as:

in the formula, Fitnessk(Abi) Denotes a k-th generation antibody AbiThe fitness of (2) is that lambda is a diversity evaluation coefficient;

the selection is performed according to the calculation of the desired reproduction rate of the antibodies, and the selected antibodies are determined according to the probability range of each betting hand by roulette until N antibody populations A _ N are selected, in particular floating points N ∈ (0,1) are randomly generated]As a betting board pointer, if Pk(Abi-1)<Π≤Pk(Abi) Then antibody Ab is selectediIn which P isk(Abi) Denotes antibody AbiThe desired rate of reproduction;

and 8: antibody crossover: performing single-point antibody crossing on the population A _ N according to the self-adaptive crossing probability to form a new population B _ N with the population scale of N;

the antibody carries out single-point cross operation according to the self-adaptive cross probability to form a new population B _ N, wherein the k generation cross probabilityAnd (3) calculating by adopting an adaptive mechanism:

in the formula (I), the compound is shown in the specification,represents the maximum fitness value in the antibody population a _ N,denotes the mean Fitness value, Fitness, of the antibody populationkDenotes two antibodies Ab to be crossediAnd AbjMedium and large fitness, i, j ═ 1,2, …, N, k1And k2Is a preset constant;

and step 9: antibody mutation: a new population C _ N with the population size of N is formed by performing variation operation based on polynomial and Gaussian mixture on the population B _ N;

polynomial variations have good local escape in the form of And xiIs antibody AbiInverse decoded values, x, before and after mutationmaxAnd xminRespectively the upper and lower limits, delta, of the reverse decoding of the population BNkThe variation control parameter representing the k-th generation is calculated as:

wherein, akTo satisfy [0,1]Uniformly distributed random numbers, eta represents a variation distribution index;

gaussian variation means that the mean value of coincidence is mu and the variance is sigma2The normally distributed random number replaces the gene value of the original antibody, has stronger local search capability and adopts the form of And xiIs antibody AbiInverse decoded values, delta, before and after mutationkThe variation control parameter representing the k-th generation is calculated as:

δk=0.1×N(0,1) (10)

wherein N (0,1) represents a one-dimensional normal distribution random number of 0 mean 1 standard deviation;

in order to combine the advantages of polynomial variation and Gaussian variation, the parameter theta is determinedkTo implement the transformation of two variant operations, which is calculated as:

wherein b is a preset probability of Gaussian variation, c is a preset parameter, k is the current iteration number, M is the maximum iteration number, Fitnessk(Abi) Is an antibody Ab in a k generation antibody population B _ Ni(i-1, 2, …, N),represents the maximum fitness value in the k-th generation antibody population BN,represents the minimum fitness value in the k-th generation antibody population B _ N. If theta is greater than thetakIf the number is less than 0.5, Gaussian variation is adopted, otherwise polynomial variation is adopted, and a new population C _ N with the size of N is generated;

step 10: judging the early maturation of the antibody: calculating the average adaptive value of the population C _ N to carry out precocity judgment, if the population C _ N is precocity, carrying out diversity maintenance of sinusoidal chaotic variation, generating a new population D _ N with the scale of N, and returning to the step 7; otherwise, directly taking the population C _ N as a next generation antibody population, and executing the step 11;

precocity judgment is based on population average adaptive value FitnessargAnd judging, specifically:

in the formula (I), the compound is shown in the specification,is the mean fitness value, ω, of the k-th generation antibody population C _ N1And ω2A very small preset positive parameter;

if the formula (12) and the formula (13) are not satisfied, the antibody is proved to have no premature phenomenon; if the formula (12) and the formula (13) are simultaneously satisfied, the antibody population is shown to generate a premature phenomenon, the population diversity is maintained by adopting a sine-based chaotic variation operation, and a new population D _ N with the population size N is specifically formed, and the calculation is as follows:

wherein x isiDenotes antibody AbiThe binary reverse decoded values of (a) are the new antibodies x generated by chaotic variationsi+1Forward binary coding is required;

step 11: update iteration number k ← k +1, and return to step 4.

2. The UUV agent behavior learning and evolution model based on the chaotic immune genetic mechanism, according to claim 1, is characterized in that:

the initialization antibody population AB in step 3 is determined by:

step 3.1: if the antibody memory population AM is empty, N2The initializing antibody population AB then consists of two parts: (1) the N/2-scale population is generated by random combination according to vaccine population AH (2) the N/2 scale is generated by random population AR by adopting a chaos mechanism, namely, an initialization antibody population AB is AH and U AR;

step 3.2: if the antibody memory population AM is not empty, and size N2< N/2, the initializing antibody population AB consists of three parts: (1) random population AR (2) memory bank with scale of N/2 randomly generated by adopting chaos mechanism and scale of N2The antibody memory population AM (3) of (2) was generated from the vaccine pool on a scale of N/2-N2The vaccine population AH, i.e. the initializing antibody population AB ═ AH &AR∪AM;

Step 3.3: if the antibody memory population AM is not empty, and size N2N/2, the initializing antibody population AB consists of two parts: (1) the random population AR (2) memory bank with the scale of N/2 randomly generated by adopting a chaos mechanism provides the antibody memory population AM with the scale of N/2, namely the initialized antibody population AB ═ AR @ U.M.

3. The UUV agent behavior learning and evolution model based on the chaotic immune genetic mechanism, according to claim 2, is characterized in that:

in step 5, the comparison of the Ab parent antibody before inoculationiFitness Fitness to antigenk(Abi) And postvaccination progeny antibodiesDegree of adaptation to antigen

Step 5.1: if fitness is improved, the parent antibody before vaccination is replaced by the vaccinated progeny antibody in the initializing antibody population AB, i.e. theThen

Step 5.2: if fitness decreases, indicating that the antibody population is degenerating, progeny antibodies are abandoned and parent antibodies are retained, i.e. the antibody population is reduced in sizeThen AbiAnd maintained unchanged.

Technical Field

The invention belongs to the technical field of underwater unmanned system modeling and simulation, and particularly relates to a UUV intelligent agent behavior learning and evolution model based on a chaos immune genetic mechanism.

Background

The military UUV is used as a complex underwater unmanned combat system with mutually coupled and mutually influenced operational elements such as weaponry, underwater environment, combat missions and the like, and has the characteristics of large endurance, good concealment, low risk, recoverability and the like. The unmanned system has strong intelligence when needing to finish the operation missions such as search investigation, remote attack, submarine defense, networking detection and the like, and the unmanned system has the advantages that the unmanned system has strong intelligence, the functions of detection, identification, communication and the like can be finished by the UUV, autonomous action reaction, planning and learning are realized, and the high-quality and high-efficiency completion of underwater operation tasks is ensured.

If the UUV has a high resolution and complete behavior rule base, it can perform behavior reaction, planning, and learning according to environment and task feature information. However, at present, the intelligent degree and navigation control computing capacity of the UUV are limited, the scale of the formed knowledge base and rule base is small, only partial rules can be obtained according to expert knowledge and simulation results, and the distance between the formed knowledge base and the rule base is far away from a high-resolution and complete behavior rule base required by the intelligent modeling of the UUV. How to form a complete behavior rule base, the behavior learning and evolution method becomes the research focus of UUV intelligent modeling.

Behavior learning and evolution methods are mainly classified into two main categories: machine learning and intelligent optimization, wherein the machine learning generally adopts modeling tools in the artificial intelligence field such as neural networks and expert systems to learn behaviors, but has the disadvantages of high computational complexity and slow convergence, and depends heavily on initial parameter assumptions. The intelligent optimization is to use evolutionary algorithms such as a particle swarm algorithm, a genetic algorithm, a simulated regression algorithm and the like to carry out iterative optimization on the behavior rules, has the characteristics of easy realization, high precision and fast convergence, can not only complete rule learning, but also meet the self-adaptive evolution of the behavior rules, but has poor compatibility of global search and local search of the algorithms.

The immune evolutionary algorithm is proposed as a novel intelligent optimization method to realize the learning and evolution of a parallel distributed self-adaptive system, and has the functions of antigen extraction, automatic identification, learning and memory and the like. At present, various immune algorithms are designed on the basis, and most methods change the form of a genetic algorithm or combine a particle swarm optimization algorithm and the like to form a combined immune algorithm.

Although these immune algorithms can achieve problem solving, search efficiency and accuracy in the global search space still need to be improved. Therefore, a behavior rule learning and evolution method with obvious rapidity and accuracy is expected to be developed, and the autonomous behavior reaction, planning and learning of the UUV underwater unmanned system are realized.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a UUV intelligent agent behavior learning and evolution model based on a chaos immune genetic mechanism, which has strong functions and complete knowledge, can quickly and accurately match and update rules, and realizes the autonomous behavior reaction, planning and learning of the UUV underwater unmanned system.

The technical problem to be solved by the invention is realized by the following technical scheme:

the method comprises the following steps:

step 1: antigen recognition: loading a problem to be solved and a constraint condition as an antigen Ag;

step 2: extracting the vaccine: uses expert and prior knowledge and characteristics according to the antigen Ag as vaccine information hjThe binary coding mode is adopted to form a genome, and the construction scale is N1Vaccine group AH:

wherein: t is a matrix transposition symbol;

and step 3: initializing antibody population: when k is 0, the memory population AM has a size of N2Where k is the number of iterations and the population size is N of the initialized antibody population AB ═ Ab1,Ab2,…,Abi,…,AbN]T,i=1,2,…,N,N2N, the initialization antibody population AB consists of a vaccine population AH and a random population AR according to the specific situation of the memory population AM, or consists of the vaccine population AH, the memory population AM and the random population AR, or consists of the memory population AM and the random population AR; wherein the random population AR adoptsGenerating a chaos mechanism; the specific chaos mechanism is realized by setting an initial value Ab0,Ab0Starting an initial value of a Logistic mapping model, generating a random population AR of an N/2 scale by adopting a one-dimensional Logistic mapping model, wherein a specific chaotic model and the random population AR are respectively as follows:

rl=4×rl-1×(1-rl-1),l=1,2,…,N/2 (2)

AR=[r1,r2,…,rl,…,rN/2]T,l=1,2,…,N/2 (3)

wherein r islRepresents the initial value Ab0Iterating intermediate variables until N/2 random populations AR are formed;

and 4, step 4: and (3) fitness calculation: when k is>1, calculating each antibody Ab in the initialized antibody population ABiFitness Fitnessk(Abi) I 1,2, …, N, selecting maxfiltness with highest fitnessk(Abi) The antibody of (2) is introduced into an antibody memory bank, and the memory bank is updated to N2+1;

And 5: and (3) vaccination: antibodies with the size of H (H < N) are adaptively selected in the initialization antibody population AB for vaccination, and the vaccination size is adaptively changed along with the iteration number k as follows:

wherein alpha and beta are adjustable parameters, and e represents an index;

step 6: and (5) finishing judgment: setting an end condition as whether the maximum iteration number M is met, if k > is equal to M, stopping, and outputting a result optimal behavior Rule; if k satisfies k < M, go to step 7;

and 7: antibody selection: calculating the affinity among the antibodies and the concentration of the antibodies, selecting the antibodies by adopting a roulette mode according to the expected reproduction rate of the antibodies, wherein the probability of each antibody being selected is in direct proportion to the expected reproduction probability of the antibody, and forming a new population A _ N with the population scale of N;

the affinity between antibodies reflects the similarity between antibodies, and antibody concentration is proposed as a measure of similarity between antibodies, which essentially calculates the proportion of similar antibodies in the antibody population:

wherein the Concentrationk(Abi) Denotes a k-th generation antibody AbiConcentration of (D), Si,jDenotes antibody AbiAnd antibody AbjI (j) represents Ab except antibodyiNumber of other antibodies, Ki,jDenotes antibody AbiAnd antibody AbjEncode the same number of bits, L represents the antibody encoding length, τ represents the similarity threshold, if Ki,jif/L is greater than or equal to τ, then Si,jIf K is equal to 1i,jif/L is less than τ, then Si,j=0;

Based on antibody concentration, antibody AbiThe desired reproduction rate of (c) was calculated as:

in the formula, Fitnessk(Abi) Denotes a k-th generation antibody AbiThe fitness of (2) is that lambda is a diversity evaluation coefficient;

the selection is performed according to the calculation of the desired reproduction rate of the antibodies, and the selected antibodies are determined according to the probability range of each betting hand by roulette until N antibody populations A _ N are selected, in particular floating points N ∈ (0,1) are randomly generated]As a betting board pointer, if Pk(Abi-1)<Π≤Pk(Abi) Then antibody Ab is selectediIn which P isk(Abi) Denotes antibody AbiThe desired rate of reproduction;

and 8: antibody crossover: performing single-point antibody crossing on the population A _ N according to the self-adaptive crossing probability to form a new population B _ N with the population scale of N;

the antibody carries out single-point cross operation according to the self-adaptive cross probability to form a new population B _ N, wherein the k generation cross probability Pk cAnd (3) calculating by adopting an adaptive mechanism:

in the formula (I), the compound is shown in the specification,represents the maximum fitness value in the antibody population a _ N,denotes the mean Fitness value, Fitness, of the antibody populationkDenotes two antibodies Ab to be crossediAnd AbjMedium and large fitness, i, j ═ 1,2, …, N, k1And k2Is a preset constant;

and step 9: antibody mutation: a new population C _ N with the population size of N is formed by performing variation operation based on polynomial and Gaussian mixture on the population B _ N;

polynomial variations have good local escape in the form of And xiIs antibody AbiInverse decoded values, x, before and after mutationmaxAnd xminRespectively the upper and lower limits, delta, of the reverse decoding of the population BNkThe variation control parameter representing the k-th generation is calculated as:

wherein, akTo satisfy [0,1]Uniformly distributed random numbers, eta represents a variation distribution index;

the Gaussian variation is usedMean of coincidence is μ and variance is σ2The normally distributed random number replaces the gene value of the original antibody, has stronger local search capability and adopts the form of xi *=xikAnd xiIs antibody AbiInverse decoded values, delta, before and after mutationkThe variation control parameter representing the k-th generation is calculated as:

δk=0.1×N(0,1) (10)

wherein N (0,1) represents a one-dimensional normal distribution random number of 0 mean 1 standard deviation;

in order to combine the advantages of polynomial variation and Gaussian variation, the parameter theta is determinedkTo implement the transformation of two variant operations, which is calculated as:

wherein b is a preset probability of Gaussian variation, c is a preset parameter, k is the current iteration number, M is the maximum iteration number, Fitnessk(Abi) Is an antibody Ab in a k generation antibody population B _ Ni(i-1, 2, …, N),represents the maximum fitness value in the k-th generation antibody population BN,represents the minimum fitness value in the k-th generation antibody population B _ N. If theta is greater than thetakIf the number is less than 0.5, Gaussian variation is adopted, otherwise polynomial variation is adopted, and a new population C _ N with the size of N is generated;

step 10: judging the early maturation of the antibody: calculating the average adaptive value of the population C _ N to carry out precocity judgment, if the population C _ N is precocity, carrying out diversity maintenance of sinusoidal chaotic variation, generating a new population D _ N with the scale of N, and returning to the step 7; otherwise, directly taking the population C _ N as a next generation antibody population, and executing the step 11;

precocity judgment is based on population average adaptive value FitnessargAnd judging, specifically:

in the formula (I), the compound is shown in the specification,is the mean fitness value, ω, of the k-th generation antibody population C _ N1And ω2A very small preset positive parameter;

if the formula (12) and the formula (13) are not satisfied, the antibody is proved to have no premature phenomenon; if the formula (12) and the formula (13) are simultaneously satisfied, the antibody population is shown to generate a premature phenomenon, the population diversity is maintained by adopting a sine-based chaotic variation operation, and a new population D _ N with the population size N is specifically formed, and the calculation is as follows:

wherein x isiDenotes antibody AbiThe binary reverse decoded values of (a) are the new antibodies x generated by chaotic variationsi+1Forward binary coding is required;

step 11: update iteration number k ← k +1, and return to step 4.

Further, the initialization antibody population AB in step 3 is determined by:

step 3.1: if the antibody memory population AM is empty, N2The initializing antibody population AB then consists of two parts: (1) the N/2-scale population is generated by random combination according to vaccine population AH (2) the N/2 scale is generated by random population AR by adopting a chaos mechanism, namely, an initialization antibody population AB is AH and U AR;

step 3.2: if the antibody remembersThe population AM is not empty and the size N2< N/2, the initializing antibody population AB consists of three parts: (1) random population AR (2) memory bank with scale of N/2 randomly generated by adopting chaos mechanism and scale of N2The antibody memory population AM (3) of (2) was generated from the vaccine pool on a scale of N/2-N2The vaccine population AH of (a), namely the initializing antibody population AB ═ AH ueam;

step 3.3: if the antibody memory population AM is not empty, and size N2N/2, the initializing antibody population AB consists of two parts: (1) the random population AR (2) memory bank with the scale of N/2 randomly generated by adopting a chaos mechanism provides the antibody memory population AM with the scale of N/2, namely the initialized antibody population AB ═ AR @ U.M.

Further, in the step 5, the Ab parent antibody before inoculation is comparediFitness Fitness to antigenk(Abi) And postvaccination progeny antibody Abi sDegree of adaptation to antigen

Step 5.1: if fitness is improved, the parent antibody before vaccination is replaced by the vaccinated progeny antibody in the initializing antibody population AB, i.e. theThen

Step 5.2: if fitness decreases, indicating that the antibody population is degenerating, progeny antibodies are abandoned and parent antibodies are retained, i.e. the antibody population is reduced in sizeThen AbiAnd maintained unchanged.

Compared with the prior art, the invention has the beneficial effects that:

the model of the invention combines the advantages of the global search capability of the basic genetic algorithm and the local search capability of the immunity and chaos mechanism, and promotes the rapid learning and evolution of the behavior rules by continuously adjusting and optimizing the search space of the problem solution.

Specifically, the scale of an antibody memory bank is continuously updated through antibody fitness calculation; and sequentially designing a selection operator based on roulette, a crossover operator based on self-adaptive adjustment and a mutation operator based on Gaussian and polynomial mixing to realize diversity of the antibody population, and performing precocity inhibition to further realize updating and iteration of the antibody population.

Drawings

FIG. 1 is a schematic diagram of a UUV agent behavior learning and evolution model based on a chaos immune genetic mechanism.

Fig. 2 is a schematic diagram of UUV obstacle avoidance.

Detailed Description

In order that the objects, aspects and advantages of the invention will become more apparent, the invention will be described by way of example only, and in connection with the accompanying drawings. It is to be understood that such description is merely illustrative and not restrictive of the scope of the invention.

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种回收产品重用的产品配置方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!