XGboost algorithm-based VOD (video on demand) service cache optimization method in edge network environment

文档序号:1218997 发布日期:2020-09-04 浏览:13次 中文

阅读说明:本技术 边缘网络环境下基于XGBoost算法的VOD业务缓存优化方法 (XGboost algorithm-based VOD (video on demand) service cache optimization method in edge network environment ) 是由 张晖 孙叶钧 赵海涛 孙雁飞 倪艺洋 朱洪波 于 2020-04-20 设计创作,主要内容包括:本发明公开了一种边缘网络环境下基于XGBoost算法的VOD业务缓存优化方法,包括如下步骤:采集视频数据;以平均访问量为预测目标,用XGBoost算法进行回归建模获得预测模型;利用预测模型对平均访问量进行预测;根据预测结果建立缓存优化模型;使用背包算法求解优化模型,得到最终缓存方案。本发明考虑到边缘服务器需要处理大量的视频信息,以及机器学习在大数据处理中出色的数据分析能力,从而使得边缘服务器最大限度地减少业务访问时延,提高了边缘服务器的缓存效率,并且该方案非常简单而易于实现,具有很好的应用前景。(The invention discloses a VOD service cache optimization method based on XGboost algorithm in edge network environment, comprising the following steps: collecting video data; taking the average visit amount as a prediction target, and performing regression modeling by using an XGboost algorithm to obtain a prediction model; predicting the average visit quantity by using a prediction model; establishing a cache optimization model according to the prediction result; and (5) solving the optimization model by using a knapsack algorithm to obtain a final caching scheme. The invention considers that the edge server needs to process a large amount of video information and the excellent data analysis capability of machine learning in big data processing, thereby leading the edge server to reduce service access delay to the utmost extent and improving the cache efficiency of the edge server.)

1. A VOD service cache optimization method based on XGboost algorithm under edge network environment is characterized in that: the method comprises the following steps:

s1: collecting video data;

s2: taking the average visit amount as a prediction target, and performing regression modeling by using an XGboost algorithm to obtain a prediction model;

s3: predicting the average visit quantity by using a prediction model;

s4: establishing a cache optimization model according to the prediction result;

s5: and (5) solving the optimization model by using a knapsack algorithm to obtain a final caching scheme.

2. The XGboost algorithm-based VOD service cache optimization method in the edge network environment according to claim 1, wherein: the obtaining of the prediction model in step S2 specifically includes: and carrying out regression training by taking the average visit quantity as a dependent variable and taking the rest characteristics as independent variables, dividing a data set, outputting importance ranks of all characteristic values, deleting the characteristics according to the ranks to obtain final modeling characteristic values, and modeling according to the modeling characteristic values to form a prediction model.

3. The XGboost algorithm-based VOD service cache optimization method in the edge network environment according to claim 2, wherein: and performing parameter adjustment by using a combined parameter adjusting mode in the process of modeling and forming the prediction model according to the modeling characteristic value to obtain a model with the minimum output square error, namely the final model.

4. The XGboost algorithm-based VOD service cache optimization method in the edge network environment according to claim 1, wherein: the establishment of the cache optimization model in the step S4 specifically includes:

setting the cache space size of the edge server as S, and the video volume set as V ═ V1,v2,…,vKThe video access amount is set as PV ═ PV }1,pv2,…,pvKAnd K is the total number of videos, so that the following cache optimization model is obtained:

wherein

Figure FDA0002457891760000012

5. The XGboost algorithm-based VOD service cache optimization method in the edge network environment according to claim 4, wherein: the solving process of the optimization model in the step S5 specifically includes:

let c (i, j) be the sum of cost performance corresponding to the best caching mode of the front i part video when the remaining capacity of the current edge server is j, that is

Figure FDA0002457891760000021

The following recursion relationship is obtained:

the second formula of formula (3) is illustrated below: when the remaining capacity of the edge server is sufficient for the ith video to be cached, the ith video is not necessarily the optimal cache selection video, and thus two situations may occur, where the first situation is that the ith video is not the optimal selection, that is, the ith video is not cached, ai0, in this case:

c(i,j)=c(i-1,j) (4)

the second case is that the ith video is the best choice, i.e. the ith video needs to be buffered, ai1, namely:

Figure FDA0002457891760000023

in the formula (5), viIs the volume of the ith video, c (i, j-v)i) The sum of the optimal cost performance obtained by the previous decision before the ith video is processed is added with the cost performance of the ith video on the basis, namely the total cost performance after the ith video is cached;

comparing the two situations to obtain the cost performance, taking the maximum value as the sum of the cost performances obtained under the condition that the residual capacity of the edge server is sufficient for the ith video to be cached, and finally obtaining the optimal caching scheme

Technical Field

The invention belongs to the technical field of edge networks, and particularly relates to a method for optimizing VOD service cache based on XGboost algorithm in an edge network environment.

Background

With the development of scientific technology, ports and devices of various standards and various services and applications are connected to the internet, so that service requests in the network are increased explosively, and then data traffic in the network is also increased in a well-spraying manner, wherein the video traffic is mainly increased. The core network is an important component in the distribution of services and the provision of services. One of the main functions of the core network is to access requests entering the network through devices and interfaces of different systems to different service networks according to service requirements, so that each service request obtains the due service. Another main function of the core network is as a service side, processing service requests submitted by each interface. The core network itself includes a plurality of different service networks, and when a service request comes, the core network provides a service for the service, and with the explosion of the traffic, the amount of the service provided by the core network increases sharply, so that the core network bears a great load pressure in both the service request processing and the service providing.

The edge network is the part of the network closest to the user. The edge network, on the one hand, shares the processing pressure of the service request for the core network, and, on the other hand, puts the service provision to the edge network, and processes the service required by the service on the edge network side if the edge network has the capability of processing. However, since the computing power of the edge network is limited, the key to maximally offload the core network is how to improve the service efficiency, and the edge cache is the key to improve the service efficiency. The edge cache means that resources with higher service use frequency are cached on an edge server, when the related services come again, the resources are directly obtained from the cache, and the service requirements which cannot be met by the edge server are obtained from a core network.

In addition, with the advent of the big data era, efficient knowledge acquisition through machine learning has gradually become one of the main impetus for technical development in various fields, and the edge network field is no exception. In the big data era, with the explosive growth of data, various new data types needing to be analyzed are emerging continuously, such as semantic understanding, image analysis, network data analysis and the like, so that machine learning plays an extremely important role in the big data environment.

The existing video caching algorithm is usually dependent on the popularity of the video, wherein the most commonly accepted method is that the popularity of the video obeys zipf distribution, which is a conclusion obtained based on a user behavior statistical mode, the mode usually has larger hysteresis and limited reference indexes, so that machine learning has better service index analysis capability at the present time of service index diversification. The prediction of the video service directly influences the caching efficiency of the edge server, the caching hit rate is high, the time delay is reduced when the user acquires data at the edge side, otherwise, the user acquires the data from the core network, and the time delay is greatly increased.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects in the prior art, the XGboost algorithm-based VOD service cache optimization method in the edge network environment is provided, the XGboost algorithm in machine learning is used for carrying out regression modeling and prediction on VOD service access amount, and a novel XGboost algorithm-based VOD service cache optimization method is provided on the basis.

The technical scheme is as follows: in order to achieve the above object, the present invention provides a method for optimizing a VOD service cache based on an XGBoost algorithm in an edge network environment, comprising the following steps:

s1: collecting video data;

s2: taking the average visit amount as a prediction target, and performing regression modeling by using an XGboost algorithm to obtain a prediction model;

s3: predicting the average visit quantity by using a prediction model;

s4: establishing a cache optimization model according to the prediction result;

s5: and (5) solving the optimization model by using a knapsack algorithm to obtain a final caching scheme.

Further, the obtaining of the prediction model in step S2 specifically includes: and carrying out regression training by taking the average visit quantity as a dependent variable and taking the rest characteristics as independent variables, dividing a data set, outputting importance ranks of all characteristic values, deleting the characteristics according to the ranks to obtain final modeling characteristic values, and modeling according to the modeling characteristic values to form a prediction model.

Further, parameter adjustment is carried out in a combined parameter adjusting mode in the process of modeling and forming the prediction model according to the modeling characteristic value, and the model with the minimum output square error is obtained and is the final model.

Further, the establishment of the cache optimization model in step S4 specifically includes:

setting the cache space size of the edge server as S, and the video volume set as V ═ V1,v2,…,vKThe video access amount is set as PV ═ PV }1,pv2,…,pvKAnd K is the total number of videos, so that the following cache optimization model is obtained:

wherein

Figure BDA0002457891770000022

Selection of optimal buffering for video, ak0 means that video k does not need to be buffered, ak1 indicates that the video k needs to be cached; a formula

Figure BDA0002457891770000023

There are two possibilities when akWhen 0, the formula is 0, and when a is not practicalkWhen 1, the ratio of the access amount of the video k to the video k volume is expressed, and the value is to balance the access amount and the video volume, so that the formula is defined

Figure BDA0002457891770000031

Representing the caching cost performance of the video k; constraint conditions

Figure BDA0002457891770000032

The sum of the volumes representing the cached videos should be less than the cache space of the edge server.

Further, the solving process of the optimization model in the step S5 specifically includes:

let c (i, j) be the sum of cost performance corresponding to the best caching mode of the front i part video when the remaining capacity of the current edge server is j, that is

The following recursion relationship is obtained:

Figure BDA0002457891770000034

the second formula of formula (3) is illustrated below: when the remaining capacity of the edge server is sufficient for the ith video to be cached, the ith video is not necessarily the optimal cache selection video, and thus two situations may occur, where the first situation is that the ith video is not the optimal selection, that is, the ith video is not cached, ai0, in this case:

c(i,j)=c(i-1,j) (4)

the second case is that the ith video is the best choice, i.e. the ith video needs to be buffered, ai1, namely:

in the formula (5), viIs the volume of the ith video, c (i, j-v)i) The sum of the optimal cost performance obtained by the previous decision before the ith video is processed is added with the cost performance of the ith video on the basis, namely the total cost performance after the ith video is cached;

comparing the twoTaking the maximum value of the performance price ratio obtained under the condition that the residual capacity of the edge server is sufficient for the ith video to be cached, and finally obtaining the optimal caching scheme

The invention utilizes XGboost algorithm in machine learning to model and predict the access amount of VOD service. On the basis, a cache optimization model is provided, so that the service delay is reduced to the maximum extent, and the cache efficiency of the edge server is improved. On one hand, the XGboost algorithm in the scheme has high prediction accuracy and is very suitable for distributed occasions; on the other hand, the scheme is very simple and easy to implement, and has a good application prospect.

The invention gives full play to the advantages of machine learning in big data processing and enables the edge side. The advantage of using the cache is that the resource acquisition speed is high, and the cache can be replaced at any time, so that the change of service change with time on the use of the resource content can be flexibly met.

Has the advantages that: compared with the prior art, the invention firstly utilizes the XGboost algorithm in machine learning to carry out regression modeling and prediction on the weekly average access quantity of the video in consideration of the fact that the edge server needs to process a large amount of video information and the outstanding data analysis capability of the machine learning in big data processing, thereby providing a new video cache optimization model on the basis and solving the model by using the knapsack algorithm. The machine learning has great advantages in learning and analyzing a large amount of data, especially in analyzing diversified user indexes, so that the accuracy in prediction is high, and therefore, based on the prediction result, the optimization result obtained by the optimization model calculation is closer to the actual optimization result, so that the cache hit rate of the edge server is greatly improved, when a large amount of services arrive, the probability of directly obtaining the data from the edge side is greatly improved, the service access delay is reduced to the maximum extent by the edge server, and the scheme is very simple and easy to implement, and has a good application prospect.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a graph comparing the average weekly visit rate of video prediction with the average weekly visit rate of actual video prediction;

FIG. 3 is a graph comparing a weekly average visit performance cost ratio for video prediction with an actual weekly average visit performance cost ratio;

FIG. 4 is a graphical representation of weekly average visitation prediction accuracy, weekly average visitation performance cost ratio prediction accuracy, and cache hit rate over time.

Detailed Description

The invention is further elucidated with reference to the drawings and the embodiments.

As shown in fig. 1, the present invention provides a method for optimizing a VOD service cache based on an XGBoost algorithm in an edge network environment, which specifically comprises the following steps:

1) XGboost algorithm based modeling and prediction of access amount of VOD (video on demand) service

1.1) acquisition of sample video data and data preprocessing

The method comprises the steps of randomly collecting relevant information of 100000 VOD videos on a video playing platform, extracting the information from the relevant information, aligning data decimal points, and rounding off the rest decimal points, wherein the obtained information comprises video access amount, online time, movie popularity list name, popularity, praise number, comment number, video score and the like. The online time is divided into the number of online days due to the fact that the online time is different, the rest data are time alignment of data, each data is processed by taking a certain time as a starting point and taking a week as a time interval, and the weekly average value of the video access amount, the online time, the movie popularity list ranking, the popularity, the praise number, the comment number, the video score and the like is obtained and cannot be rounded up for decimal values (such as the online time, the popularity list ranking and the like). And if the online time is less than one week, the missing data is filled with 0.

1.2) modeling and prediction with XGboost Algorithm

Before modeling by using an XGboost algorithm, firstly carrying out null value processing on data, wherein information loss occurs when video information is incomplete, so that model training is influenced, preprocessing is carried out on the data by using an Imputer class in a preprocessing package of sklern, null value filling is carried out by using a median, 60% of data sets are used as training sets, 40% of the data sets are used as test sets, the above line time, film popularity ranking, popularity, comment number and score are used as independent variables, and the visit amount is used as a dependent variable for carrying out regression training. And dividing the data set by using 10-fold cross validation, outputting the importance ranking of each characteristic value, removing the characteristics with too low importance, reducing the complexity of the model and obtaining the final modeling characteristic value.

In the modeling process, a combined parameter adjusting mode is used for parameter adjustment to obtain a model with the minimum output square error, namely a final model, and the model is used for predicting the weekly visit quantity of the next week of videos.

2) Establishing a cache optimization model

Assuming that the cache space of the edge server is S, the video volume set is V ═ V1,v2,…,vKThe video access amount is set as PV ═ PV }1,pv2,…,pvKK is the total number of videos, so that the following cache optimization model can be obtained:

whereinSelection of optimal buffering for video, ak0 means that video k does not need to be buffered, ak1 indicates that the video k needs to be cached; a formulaThere are two possibilities when akWhen 0, the formula is 0, and when a is not practicalkWhen 1, the ratio of the access amount of the video k to the video k volume is expressed, and the value is to balance the access amount and the video volume. Suppose that the predicted amount of access for video k is high, but at the same time the volume of the videoThe video cache memory of the edge server is occupied due to the fact that the video volume is very large, if the number of the videos is large, the videos which can be cached in the edge server are reduced greatly, and the caching effect cannot be guaranteed, so that the defined expression

Figure BDA0002457891770000054

The caching cost performance of the video k is expressed, and the optimization aim is to maximize the caching cost performance of the video; furthermore, constraintsThe sum of the volumes representing the cached videos should be less than the cache space of the edge server.

3) Solving optimization models using knapsack algorithm

The problem represented by the above optimization model is actually a 0-1 knapsack problem in nature, namely: there are K items, each having its own value, the value being represented by formulaThe edge server is a backpack, and the capacity of the backpack is certain, namely, how to load the articles with the largest sum of values in the backpack. The knapsack problem belongs to a dynamic planning problem, the dynamic planning solving idea is similar to the dividing and treating idea, large problems are decomposed into small problems, and the small problems are solved by finding the relation between the large problems and the small problems, so that the solution of the large problems can be obtained. The solving process is divided into three steps, firstly modeling, secondly searching for constraint, and finally searching for a recursion relational expression, wherein the idea of searching for the recursion relational expression is as follows:

when the video is cached in the edge server, two possibilities exist, the first is that the residual capacity of the edge server is smaller than the volume of the ith video currently being cached in the edge server, and the edge server cannot load the video to be cached currently, namely the value of the ith video is the same as the value of the (i-1) th video. The second case is that the remaining capacity of the current edge server is larger than the volume of the i-th video currently being played back to the edge server, but the i-th video is not always optimal after being played back, and therefore, it is necessary to select between loading and unloading.

According to the above thought, let c (i, j) be the sum of the cost performance corresponding to the best caching mode of the front i part video when the remaining capacity of the current edge server is j, that is to say

Figure BDA0002457891770000061

The following recursion relation can be obtained:

the second formula of formula (3) is illustrated below: when the remaining capacity of the edge server is sufficient for the ith video to be cached, the ith video is not necessarily the optimal cache selection video, and thus two situations may occur, where the first situation is that the ith video is not the optimal selection, that is, the ith video is not cached, ai0, in this case:

c(i,j)=c(i-1,j) (4)

the second case is that the ith video is the best choice, i.e. the ith video needs to be buffered, ai1, namely:

in the formula (5), viIs the volume of the ith video, c (i, j-v)i) The sum of the optimal cost performance obtained by the previous decision before the ith video is processed is added with the cost performance of the ith video on the basis, namely the total cost performance after the ith video is cached.

Comparing the two situations to obtain the cost performance, taking the maximum value as the sum of the cost performances obtained under the condition that the residual capacity of the edge server is sufficient for the ith video to be cached, and finally obtaining the optimal cached video set

Figure BDA0002457891770000071

According to the summary, the scheme of the invention mainly comprises three contents: firstly, modeling and predicting the access amount of the VOD service by using an XGboost algorithm in a week unit; secondly, establishing a cache optimization model under the condition that the cache space of the edge server is limited; thirdly, solving the optimization model by using a knapsack algorithm according to the prediction result.

The present embodiment utilizes the existing data simulation results to illustrate the optimization effect of the present invention. Let the test video set be c ═ c1,c2,…cLTesting the average visit quantity of the video in the video set as pv ═ pv1,pv2,…,pvLThe average access amount per cycle of the actual video is pv' ═ pv1',pv'2,…,pv'LAnd defining the weekly average visit rate prediction accuracy as follows:

Figure BDA0002457891770000072

the comparison of the average weekly visit rate of video prediction and the average weekly visit rate of actual video prediction is shown in the simulation result in fig. 2. P can be obtained through calculationpv=93.8%。

The prediction cycle average access capacity performance price ratio set of the video in the test video set is cp ═ cp1,cp2,…,cpLThe performance price ratio set of the actual weekly average access volume is cp' ═ cp1',cp'2,…,cp'LDefining weekly average visit volume performance price ratio prediction accuracy:

Figure BDA0002457891770000073

a comparison of the weekly average visit rate for video prediction and the actual weekly average visit rate is shown in fig. 3. P can be obtained through calculationcp=93.3%。

Assume that the set of videos cached in the edge server is cAPractice ofThe average visit volume performance price ratio of the week is sorted in descending order, and the video set is cBWherein the set cAAnd cBThe lengths are equal, and the cache hit rate is defined as:

Figure BDA0002457891770000074

p is obtained through calculationc=94.9%

From the simulation results, the weekly average visit rate prediction accuracy, the weekly average visit rate performance price ratio prediction accuracy and the cache hit rate are higher, which shows that the effect of the invention on improving the caching efficiency of the edge server is obvious, and in addition, the cache hit rate is high, which also shows that the service reaching the edge server obtains the required resources with high probability, thereby reducing the service delay.

Fig. 4 is a schematic diagram of the weekly average visit rate prediction accuracy, the weekly average visit rate performance price ratio prediction accuracy and the cache hit rate changing with time, which illustrates that the optimal cache set obtained by the present invention does not fluctuate greatly with time, and the update cost of the prediction algorithm is low.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种VANET中基于时变线性的加速强化学习边缘缓存方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类