Video super-resolution processing method and device for mobile equipment and storage medium

文档序号:1908502 发布日期:2021-11-30 浏览:14次 中文

阅读说明:本技术 面向移动设备的视频超分辨率处理方法、装置、存储介质 (Video super-resolution processing method and device for mobile equipment and storage medium ) 是由 于树昌 张莉 于 2021-11-04 设计创作,主要内容包括:本发明涉及视频超分辨率处理技术领域,公开了一种面向移动设备的视频超分辨率处理方法、装置、存储介质,该方法包括:获取每帧的视频帧图像,并对视频帧图像进行分割,利用canny边缘检测算法对分割得到的子图进行边缘检测,根据边缘检测的复杂程度对子图进行优先级排序;将优先级最高的前K个子图传送到云服务器,云服务器接收到子图后基于深度神经网络对子图进行超分辨率处理;利用huffman编码方法对超分辨处理后的子图进行编码,压缩图像大小后传送给移动设备;移动设备接收到编码子图后进行解码,并将超分辨处理后的子图按照原图像位置进行拼接形成超分辨处理后的视频。本发明实现了视频的超分辨率处理。(The invention relates to the technical field of video super-resolution processing, and discloses a mobile-device-oriented video super-resolution processing method, a device and a storage medium, wherein the method comprises the following steps: acquiring a video frame image of each frame, segmenting the video frame image, performing edge detection on a sub-image obtained by segmentation by using a canny edge detection algorithm, and performing priority sequencing on the sub-image according to the complexity of the edge detection; the first K sub-graphs with the highest priority are transmitted to a cloud server, and the cloud server receives the sub-graphs and then carries out super-resolution processing on the sub-graphs based on a deep neural network; coding the sub-graph after super-resolution processing by using a huffman coding method, compressing the size of the image and transmitting the image to mobile equipment; and the mobile equipment decodes the received coded subgraph and splices the subgraph subjected to super-resolution processing according to the position of the original image to form a super-resolution processed video. The invention realizes the super-resolution processing of the video.)

1. A video super-resolution processing method facing a mobile device, the method comprising:

acquiring a video frame image of each frame, segmenting the video frame image to form subgraphs with equal sizes, carrying out edge detection on the segmented subgraphs by using a canny edge detection algorithm, and carrying out priority ranking on the subgraphs according to the complexity of the edge detection, wherein the higher the complexity of the edge detection is, the higher the priority of the subgraphs is;

the first K sub-graphs with the highest priority are transmitted to a cloud server, and the cloud server receives the sub-graphs and then carries out super-resolution processing on the sub-graphs based on a deep neural network;

coding the sub-graph after super-resolution processing by using a huffman coding method, compressing the size of the image and transmitting the image to mobile equipment;

and the mobile equipment receives the coded subgraph, decodes the coded subgraph, splices the subgraph subjected to super-resolution processing according to the position of the original image to form a video frame image subjected to super-resolution processing, and splices the video frame image into a video according to the sequence of the video frames.

2. The method for processing the video super resolution for the mobile device according to claim 1, wherein the obtaining the video frame image of each frame and dividing the video frame image to form sub-images with equal size comprises:

acquiring a video to be subjected to super-resolution processing, acquiring video frame images from the video frame by frame, and segmenting the video frame images to form sub-images with equal size, wherein the size of the sub-images isA pixel.

3. The method as claimed in claim 2, wherein the performing edge detection on the segmented sub-graph by using a canny edge detection algorithm comprises:

carrying out edge detection on the sub-image obtained by segmentation by utilizing a canny edge detection algorithm, wherein the edge detection flow based on the canny edge detection algorithm comprises the following steps:

1) carrying out gray level extraction on the subgraph, wherein the gray level extraction process comprises the following steps:

acquiring an RGB color pixel value of each pixel point in the subgraph;

converting the RGB color pixel value of each pixel point into a gray value:

wherein:

is a pixel pointThe gray value of (a);

is a pixel pointThe value of the red color component of (a),is a pixel pointThe green color component of (a) is,is a pixel pointThe blue component of (a);

structure of the deviceFilling the gray value of each pixel point into the gray matrix according to the position of the pixel point;

2) filtering and denoising the gray matrix Q by using a Gaussian filtering method, wherein the filtering and denoising process flow comprises the following steps:

is sized toAnd the standard deviation is 1, and the set Gaussian kernel form is as follows:

multiplying the gray value of each pixel in the gray matrix Q by a Gaussian kernel, and taking the multiplication result as the gray value after filtering and noise reduction;

3) for centering on the grey value of an arbitrary pixel uGray matrixUsing Sobel operatorAndcomputing a gradient matrix of pixels

Wherein:

a gradient matrix representing the pixel u in the x-axis direction;

a gradient matrix representing the pixel u in the y-axis direction;

4) the position of the gray value of the pixel u in the matrix is represented asIndicating that the grey value of pixel u is at the second of the matrix QGo to the firstIs listed inConnecting adjacent gray values as a center, dividing a gray matrix centered on the gray value of the pixel u into 8 regions according toAndpositive and negative size judgment ofThe area to which it belongs;

computingAndif, ifAre all greater thanAndif the pixel u is an edge pixel, the gray value of the pixel u is retained, otherwise it is set to 0,andthe calculation formula of (2) is as follows:

wherein:

the first of the matrix QGo to the firstA gradient matrix of pixels corresponding to the gray values of the columns;

representing a gradient matrixMultiplying by a gradient matrixThe inverse matrix of (d);

5) performing steps 1) -4) on each gray value in the gray matrix of each subgraph, and calculating the number of edge pixels reserved in each subgraph; the subgraph with the larger number of edge pixels has higher edge detection complexity, and the subgraph with the higher edge detection complexity has higher priority.

4. The method for super-resolution processing of the video facing the mobile device according to claim 3, wherein the super-resolution processing of the sub-graph based on the deep neural network after the sub-graph is received by the cloud server comprises:

after receiving the subgraph, the cloud server carries out super-resolution processing on the subgraph by using the deep nerve, wherein the super-resolution processing flow of the subgraph is as follows:

1) receiving a low-resolution subgraph, and extracting shallow features of the low-resolution subgraph by using a shallow feature extraction module, wherein the shallow feature extraction module consists of two convolution layers; the formula for shallow feature extraction is as follows:

wherein:

the time of the low-resolution subgraph is represented by the subgraph t with low resolution, and i represents the ith block subgraph segmented from the video frame image;

a shallow feature extraction module;

representing the extracted shallow features;

2) extracting multi-resolution scale features of the shallow features by using a multi-resolution module:

wherein:

representing an extraction resolution ofA network module for a hierarchical feature of the network,(ii) a Wherein each network moduleEach comprising a convolution kernel with step size 2 and a chaining module D, said convolution kernel with step size 2 being used to perform operations on input featuresSaid concatenation module is composed of n basic convolution unitsThe rear end of the link module is a groupThe convolution layer is composed of convolution kernels of pixels, the basic convolution units are connected to extract multi-level features, the output features of all the previous basic convolution units are accumulated and input to the next basic convolution unit, and the output form of the link module is as follows:

wherein:

representing feature fusion;

representing the last convolutional layer in the link module;

input features representing linked modules;

respectively resolution isA scale characteristic of (a);

3) using a link module D of resolutionPerforming semantic extraction on the scale features to obtain semantic features of the subgraph, wherein the semantic extraction formula is as follows:

wherein:

representing a resolution ofA scale characteristic of (a);

representing utilizing chaining diesBlock D processes the input features;

4) carrying out feature extraction processing on semantic features and scale features by using a full-link module, wherein the full-link module comprises 5 link modules D, and the feature extraction processing formula is as follows:

wherein:

representing the extracted shallow features;

represents the ith link module D in the full link module;

representing extracted multi-scale features and semantic features

5) Features to be output in fully-linked modulesAnd taking the shallow feature as a layering feature G of the final low-resolution subgraph:

6) convolving hierarchical features into superpixels using sub-pixel convolution layersResolution subgraphWherein, in the step (A),low resolution subgraph for representing ith block of video frame image at time tThe super-resolution subgraph;

7) super-resolution subgraph obtained by convolution by using global low-rank regularization video super-resolution processing methodPerforming super-separation processing, wherein the target function of the global low-rank regularization video super-separation processing is as follows:

wherein:

the super-resolution subgraph is the final super-resolution subgraph after the super-resolution processing;

obtaining a super-resolution subgraph for the layering characteristic convolution;

represents the ith block low resolution sub-picture in the video frame picture at time t,andare respectively asThe previous frame and the next frame;

respectively are control coefficients;

and (3) optimizing and solving the objective function by using an L-BFGS algorithm:

the objective function is converted into:

by means of iteration, obtainApproximation of (1)

Wherein:

i is an identity matrix;

is an identity matrix;

t represents transposition;

as a function of transformationA derivative of (a);

as a function of transformationThe reciprocal of the second derivative;

will be provided withAsAnd calculating to obtainAnd solving by using an iterative method to obtain a final super-resolution subgraph after super-resolution processing, wherein the final super-resolution subgraph is as follows:

wherein:

is composed of

Finally obtaining a super-resolution sub-image sequence of the ith block of the video frame image at different moments

5. The method for processing video super resolution for mobile devices according to claim 4, wherein said encoding the super resolution processed sub-graph by using huffman coding method comprises:

1) obtaining a binary intensity value of each pixel of the sub-image after super-resolution processing in an RGB color channel through matlab scanning, and taking the binary intensity value as a huffman coding information source, whereinIndicating the subgraph of the ith block low-resolution subgraph of the video frame image at different moments;

2) scanning and counting input huffman coding information sources, and determining the occurrence frequency and probability of each symbol to determine the weight of each information source symbol;

3) respectively distributing a code element 0 and a code element 1 for two information source symbols with the lowest occurrence probability, adding corresponding probabilities of the two information source symbols to serve as new symbol probabilities, and re-participating in sequencing with the rest information source symbols, wherein the larger the weight of the information source symbols is, the more the sequencing is;

4) repeating the operation of the step 3) on the new sequencing result;

5) repeating the above process until all the source symbols are distributed to obtain corresponding code elements;

6) recording the code element content distributed by each information source symbol from the end of the sequencing result to the front step by step to finally obtain the coding code word of each information source symbol;

7) and accumulating the coded code words of each information source symbol, wherein the accumulated result is the huffman coding result of the sub-graph after the super-resolution processing.

6. The method for super-resolution processing of video for mobile devices according to claim 5, wherein said stitching the super-resolution processed sub-images according to the original image positions to form super-resolution processed video frame images, and stitching the video frame images into video according to the video frame sequence comprises:

the mobile equipment receives the coded subgraph and then carries out decoding processing on the coded subgraph, the decoding operation is the reverse process of the coding operation, the binary RGB color intensity value of each pixel in the subgraph after super-resolution processing is obtained, and the pixels are combined into the subgraph after the super-resolution processing by utilizing matlab according to the color intensity value of each pixel;

and splicing the sub-images after the super-resolution processing according to the positions of the original images to form video frame images after the super-resolution processing, and splicing the video frame images into a video according to the sequence of the video frames.

7. A video super-resolution processing apparatus, characterized in that the apparatus comprises:

the video frame image acquisition device is used for acquiring a video frame image of each frame and dividing the video frame image to form sub-images with equal size;

the image processor is used for carrying out edge detection on the sub-images obtained by segmentation by using a canny edge detection algorithm and carrying out priority ranking on the sub-images according to the complexity of the edge detection;

the video super-resolution processing device is used for transmitting the first K sub-graphs with the highest priority to the cloud server, the cloud server receives the sub-graphs and then carries out super-resolution processing on the sub-graphs based on the deep neural network, the sub-graphs after super-resolution processing are coded by using a Huffman coding method, the size of the compressed images is transmitted to the mobile device, the mobile device receives the coded sub-graphs and then decodes the coded sub-graphs, the sub-graphs after super-resolution processing are spliced according to the positions of original images to form video frame images after super-resolution processing, and the video frame images are spliced into videos according to the sequence of the video frames.

8. A computer-readable storage medium having stored thereon video super-resolution processing program instructions executable by one or more processors to implement the mobile device-oriented video super-resolution processing method of any one of claims 1-6.

Technical Field

The present invention relates to the technical field of video super-resolution processing, and in particular, to a method, an apparatus, and a storage medium for video super-resolution processing for a mobile device.

Background

The existing video super-resolution processing seriously depends on computing power and puts high requirements on computing hardware, but for common mobile equipment, the requirement on the super-resolution processing of videos cannot be met completely. If the video is subjected to super-distribution to cloud processing, although the video super-distribution problem can be solved, the video becomes large after the super-distribution, which means that the video occupies a large bandwidth, and the video playing experience is influenced.

In view of this, how to implement super-resolution processing of videos for mobile devices becomes a problem to be solved by those skilled in the art.

Disclosure of Invention

The invention provides a video super-resolution processing method, a video super-resolution processing device and a storage medium for a mobile device, which aim to realize the super-resolution processing of videos for the mobile device.

In order to achieve the purpose, the invention adopts the following technical scheme:

in one aspect, a method for processing video super-resolution for a mobile device is provided, which includes:

acquiring a video frame image of each frame, segmenting the video frame image to form subgraphs with equal sizes, carrying out edge detection on the segmented subgraphs by using a canny edge detection algorithm, and carrying out priority ranking on the subgraphs according to the complexity of the edge detection, wherein the higher the complexity of the edge detection is, the higher the priority of the subgraphs is;

the first K sub-graphs with the highest priority are transmitted to a cloud server, and the cloud server receives the sub-graphs and then carries out super-resolution processing on the sub-graphs based on a deep neural network;

coding the sub-graph after super-resolution processing by using a huffman coding method, compressing the size of the image and transmitting the image to mobile equipment;

and the mobile equipment receives the coded subgraph, decodes the coded subgraph, splices the subgraph subjected to super-resolution processing according to the position of the original image to form a video frame image subjected to super-resolution processing, and splices the video frame image into a video according to the sequence of the video frames.

Optionally, the acquiring a video frame image of each frame and segmenting the video frame image to form equal-sized subgraphs includes:

acquiring a video to be subjected to super-resolution processing, acquiring video frame images from the video frame by frame, and segmenting the video frame images to form sub-images with equal size, wherein the size of the sub-images isPixel, in one embodiment of the present invention, M has a value of 112 and N has a value of 56.

Optionally, the performing edge detection on the segmented sub-graph by using a canny edge detection algorithm includes:

carrying out edge detection on the sub-image obtained by segmentation by utilizing a canny edge detection algorithm, wherein the edge detection flow based on the canny edge detection algorithm comprises the following steps:

1) carrying out gray level extraction on the subgraph, wherein the gray level extraction process comprises the following steps:

acquiring an RGB color pixel value of each pixel point in the subgraph;

converting the RGB color pixel value of each pixel point into a gray value:

wherein:

is a pixel pointThe gray value of (a);

is a pixel pointThe value of the red color component of (a),is a pixel pointThe green color component of (a) is,is a pixel pointThe blue component of (a);

structure of the deviceFilling the gray value of each pixel point into the gray matrix according to the position of the pixel point;

2) filtering and denoising the gray matrix Q by using a Gaussian filtering method, wherein the filtering and denoising process flow comprises the following steps:

is sized toAnd the standard deviation is 1, and the set Gaussian kernel form is as follows:

multiplying the gray value of each pixel in the gray matrix Q by a Gaussian kernel, and taking the multiplication result as the gray value after filtering and noise reduction; in one embodiment of the invention, the following gray matrix exists for pixel i:

the filtering noise reduction result of the gray value i is:

3) for centering on the grey value of an arbitrary pixel uGray matrixUsing Sobel operatorAndcomputing a gradient matrix of pixels

Wherein:

a gradient matrix representing the pixel u in the x-axis direction;

a gradient matrix representing the pixel u in the y-axis direction;

4) the position of the gray value of the pixel u in the matrix is represented asRepresentation imageThe grey value of the pixel u being at the second of the matrix QGo to the firstIs listed inConnecting adjacent gray values as a center, dividing a gray matrix centered on the gray value of the pixel u into 8 regions according toAndpositive and negative size judgment ofIn one embodiment of the invention, ifAndif both are positive values, thenIn the 0-90 degree region, if/Then, thenIn the 0-45 degree region;

computingAndif, ifAre all greater thanAndif the pixel u is an edge pixel, the gray value of the pixel u is retained, otherwise it is set to 0,andthe calculation formula of (2) is as follows:

wherein:

the first of the matrix QGo to the firstA gradient matrix of pixels corresponding to the gray values of the columns;

to representGradient matrixMultiplying by a gradient matrixThe inverse matrix of (d);

5) performing steps 1) -4) on each gray value in the gray matrix of each subgraph, and calculating the number of edge pixels reserved in each subgraph; the subgraph with the larger number of edge pixels has higher edge detection complexity, and the subgraph with the higher edge detection complexity has higher priority.

Optionally, after receiving the subgraph, the cloud server performs super-resolution processing on the subgraph based on the deep neural network, including:

after receiving the subgraph, the cloud server carries out super-resolution processing on the subgraph by using the deep nerve, wherein the super-resolution processing flow of the subgraph is as follows:

1) receiving a low-resolution subgraph, and extracting shallow features of the low-resolution subgraph by using a shallow feature extraction module, wherein the shallow feature extraction module is composed of two convolutional layers, and in one specific embodiment of the invention, each convolutional layer is composed of two convolutional layers with the size ofThe convolution kernel of the pixel is formed, and the step length of the convolution layer is 1; the formula for shallow feature extraction is as follows:

wherein:

the time of the low-resolution subgraph is represented by the subgraph t with low resolution, and i represents the ith block subgraph segmented from the video frame image;

a shallow feature extraction module;

representing the extracted shallow features;

2) extracting multi-resolution scale features of the shallow features by using a multi-resolution module:

wherein:

representing an extraction resolution ofA network module for a hierarchical feature of the network,(ii) a Wherein each network moduleEach comprising a convolution kernel of step size 2 for down-sampling 1/2 the input features and a linking module D consisting of n basic convolution unitsThe rear end of the link module is a groupThe convolution layer is composed of convolution kernels of pixels, the basic convolution units are connected to extract multi-level features, the output features of all the previous basic convolution units are accumulated and input to the next basic convolution unit, and the output form of the link module is as follows:

wherein:

representing the last convolutional layer in the link module;

representing feature fusion;

input features representing linked modules;

respectively resolution isA scale characteristic of (a);

3) performing semantic extraction on the scale feature with the resolution of 1/8 by using a link module D to obtain the semantic feature of the subgraph, wherein the semantic extraction formula is as follows:

wherein:

scale features with a resolution of 1/8;

representing the processing of the input features by a link module D;

4) carrying out feature extraction processing on semantic features and scale features by using a full-link module, wherein the full-link module comprises 5 link modules D, and the feature extraction processing formula is as follows:

wherein:

representing the extracted shallow features;

represents the ith link module D in the full link module;

representing extracted multi-scale features and semantic features

5) Features to be output in fully-linked modulesAnd taking the shallow feature as a layering feature G of the final low-resolution subgraph:

6) convolving the layered features into super-resolution subgraphs using sub-pixel convolution layersWherein, in the step (A),low resolution subgraph for representing ith block of video frame image at time tThe super-resolution subgraph; the sub-pixel convolution layer divides each pixel feature in the layered features intoSmaller pixel features to process the low resolution sub-image as r times super-resolution features, and in one embodiment of the invention, if r is equal to 3, each pixel feature is divided into oneThe center in the pixel matrix is a pixel characteristic value, and the rest part is supplemented with 0;

7) super-resolution subgraph obtained by convolution by using global low-rank regularization video super-resolution processing methodPerforming super-separation processing, wherein the target function of the global low-rank regularization video super-separation processing is as follows:

wherein:

the super-resolution subgraph is the final super-resolution subgraph after the super-resolution processing;

obtaining a super-resolution subgraph for the layering characteristic convolution;

represents the ith block low resolution sub-picture in the video frame picture at time t,andare respectively asThe previous frame and the next frame;

respectively are control coefficients;

and (3) optimizing and solving the objective function by using an L-BFGS algorithm:

the objective function is converted into:

by means of iteration, obtainApproximation of (1)

Wherein:

i is an identity matrix;

is an identity matrix;

t represents transposition;

as a function of transformationA derivative of (a);

as a function of transformationThe reciprocal of the second derivative;

will be provided withAsAnd calculating to obtainAnd solving by using an iterative method to obtain a final super-resolution subgraph after super-resolution processing, wherein the final super-resolution subgraph is as follows:

wherein:

is composed of

Finally obtaining a super-resolution sub-image sequence of the ith block of the video frame image at different moments

Optionally, the encoding the sub-graph after the super-resolution processing by using the huffman coding method includes:

11) obtaining a binary intensity value of each pixel of the sub-image after super-resolution processing in an RGB color channel through matlab scanning, and taking the binary intensity value as a huffman coding information source, whereinIndicating the subgraph of the ith block low-resolution subgraph of the video frame image at different moments;

2) in a specific embodiment of the invention, a 12-bit binary sequence 001101101001 is scanned and read according to a group of 2 bits, the result is 00, 11, 01, 10 and 01, the obtained source symbols are 00, 11, 01 and 10, and the probabilities are respectively 0.17, 0.17, 0.33 and 0.33; if the scanning reading is carried out according to the group of 3 bits, the obtained results are 001, 101, 101 and 001, the obtained information source symbols have two types of 001 and 101, and the probability is 0.5;

3) respectively distributing a code element 0 and a code element 1 for two information source symbols with the lowest occurrence probability, adding corresponding probabilities of the two information source symbols to serve as new symbol probabilities, and re-participating in sequencing with the rest information source symbols, wherein the larger the weight of the information source symbols is, the more the sequencing is;

4) repeating the operation of the step 3) on the new sequencing result;

5) repeating the above process until all the source symbols are distributed to obtain corresponding code elements;

6) recording the code element content distributed by each information source symbol from the end of the sequencing result to the front step by step to finally obtain the coding code word of each information source symbol;

7) and accumulating the coded code words of each information source symbol, wherein the accumulated result is the huffman coding result of the sub-graph after the super-resolution processing.

Optionally, the splicing the sub-images after the super resolution processing according to the original image position to form a video frame image after the super resolution processing, and splicing the video frame image into a video according to the video frame sequence includes:

the mobile equipment receives the coded subgraph and then carries out decoding processing on the coded subgraph, the decoding operation is the reverse process of the coding operation, the binary RGB color intensity value of each pixel in the subgraph after super-resolution processing is obtained, and the pixels are combined into the subgraph after the super-resolution processing by utilizing matlab according to the color intensity value of each pixel;

and splicing the sub-images after the super-resolution processing according to the positions of the original images to form video frame images after the super-resolution processing, and splicing the video frame images into a video according to the sequence of the video frames.

Further, to achieve the above object, the present invention also provides a video super-resolution processing apparatus comprising:

the video frame image acquisition device is used for acquiring a video frame image of each frame and dividing the video frame image to form sub-images with equal size;

the image processor is used for carrying out edge detection on the sub-images obtained by segmentation by using a canny edge detection algorithm and carrying out priority ranking on the sub-images according to the complexity of the edge detection;

the video super-resolution processing device is used for transmitting the first K sub-graphs with the highest priority to the cloud server, the cloud server receives the sub-graphs and then carries out super-resolution processing on the sub-graphs based on the deep neural network, the sub-graphs after super-resolution processing are coded by using a Huffman coding method, the size of the compressed images is transmitted to the mobile device, the mobile device receives the coded sub-graphs and then decodes the coded sub-graphs, the sub-graphs after super-resolution processing are spliced according to the positions of original images to form video frame images after super-resolution processing, and the video frame images are spliced into videos according to the sequence of the video frames.

Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon video super-resolution processing program instructions executable by one or more processors to implement the mobile device-oriented video super-resolution processing method as described above.

Compared with the prior art, the invention provides a video super-resolution processing method for mobile equipment, which has the following advantages:

firstly, the scheme realizes the super-resolution processing of the video at the cloud server, reduces the requirement of high-performance equipment for video processing, and sends the video image after the super-resolution processing to the mobile terminal, so that the super-resolution processing of the mobile terminal is realized, the cost of video super-resolution processing is reduced, and the user experience of video playing is improved. Meanwhile, the scheme provides a super-resolution processing scheme based on a deep neural network: receiving a subgraph with low resolution, and extracting shallow features of the subgraph with low resolution by using a shallow feature extraction module, wherein the shallow feature extraction module is composed of two convolutional layers, and each convolutional layer is composed of two sizesThe convolution kernel of the pixel is formed, and the step length of the convolution layer is 1; the formula for shallow feature extraction is as follows:

wherein:

for a sub-picture t of low resolution, the time of the sub-picture of low resolution is represented, i represents the ith block divided in the video frame pictureSubgraph;a shallow feature extraction module;representing the extracted shallow features; extracting multi-resolution scale features of the shallow features by using a multi-resolution module:

wherein:representing an extraction resolution ofA network module for a hierarchical feature of the network,(ii) a Wherein each network moduleEach comprising a convolution kernel of step size 2 for down-sampling 1/2 the input features and a linking module D consisting of n basic convolution unitsThe rear end of the link module is a groupThe convolution layer is composed of convolution kernels of pixels, the basic convolution units are connected to extract multi-level features, the output features of all the previous basic convolution units are accumulated and input to the next basic convolution unit, and the output form of the link module is as follows:

wherein:representing the last convolutional layer in the link module;input features representing linked modules;respectively resolution isCompared with the traditional scheme, the scheme extracts the multi-resolution features, and different resolution features comprise different receptive fields, so that richer context features are extracted, and more accurate super-resolution images are reconstructed; performing semantic extraction on the scale feature with the resolution of 1/8 by using a link module D to obtain the semantic feature of the sub-image, wherein the semantic feature obtains the semantic information feature representation of the scale feature through convolution under the condition that the image resolution is not changed, and the semantic extraction formula is as follows:

wherein:scale features with a resolution of 1/8;representing the processing of the input features by a link module D;

utilize the full-link module to carry out the feature extraction processing to semantic feature and scale feature, because above-mentioned feature has contained great receptive field, nevertheless because resolution ratio is low excessively, this scheme provides a full-link module, the full-link module includes 5 link module D, fuses low resolution ratio feature and the unchanged convolution characteristic of resolution ratio together in parallel, forms multi-level characteristic, the feature extraction processing formula is:

wherein:representing the extracted shallow features;represents the ith link module D in the full link module;representing extracted multi-scale features and semantic features(ii) a Features to be output in fully-linked modulesAnd the shallow layer feature is used as a layering feature G of the final low-resolution sub-image, the layering feature comprises the deep layer feature of the multi-resolution of the low-resolution image, the detail feature of the shallow layer and the image semantic feature, and better super-resolution image processing is realized:

Convolving the layered features into super-resolution subgraphs using sub-pixel convolution layersWherein, in the step (A),low resolution subgraph for representing ith block of video frame image at time tThe sub-pixel convolution layer divides each pixel feature in the hierarchical features intoSmaller pixel features to process the low resolution sub-image as r times super-resolution features, if r is equal to 3, each pixel feature is divided into oneThe center of the pixel matrix is the characteristic value of the pixel, and the rest of the pixel matrix is supplemented with 0, so that the scheme can realize the video super-resolution processing of different multiples according to the resolution condition of the video.

In order to comprehensively consider the similarity of the image time dimension of the video frame, the scheme decouples the image space dimension and the time dimension, introduces norm global low-rank regularization, and utilizes a global low-rank regularization video super-resolution processing method to carry out super-resolution sub-image processing on the convolved super-resolution sub-imagePerforming super-separation processing, wherein the target function of the global low-rank regularization video super-separation processing is as follows:

wherein:the super-resolution subgraph is the final super-resolution subgraph after the super-resolution processing;obtaining a super-resolution subgraph for the layering characteristic convolution;represents the ith block low resolution sub-picture in the video frame picture at time t,andare respectively asThe previous frame and the next frame;respectively are control coefficients; and (4) carrying out optimization solution on the target function by using an L-BFGS algorithm to obtain a super-resolution sub-graph based on image space dimension and time dimension decoupling.

Meanwhile, the sub-graph after super-resolution processing is coded by using a huffman coding method, and the flow of the huffman coding method is as follows: obtaining a binary intensity value of each pixel of the sub-image after the super-resolution processing in an RGB color channel through matlab scanning, and taking the binary intensity value as a huffman coding information source; scanning and counting input huffman coding information sources, and determining the occurrence frequency and probability of each symbol to determine the weight of each information source symbol; respectively distributing a code element 0 and a code element 1 for two information source symbols with the lowest occurrence probability, adding corresponding probabilities of the two information source symbols to serve as new symbol probabilities, and re-participating in sequencing with the rest information source symbols, wherein the larger the weight of the information source symbols is, the more the sequencing is; repeating the above process until all the source symbols are distributed to obtain corresponding code elements; recording the code element content distributed by each information source symbol from the end of the sequencing result to the front step by step to finally obtain the coding code word of each information source symbol; and accumulating the coded code words of each information source symbol, wherein the accumulated result is the huffman coding result of the sub-graph after the super-resolution processing. And finally, transmitting the huffman coding result of the sub-graph after the super-resolution processing to the mobile equipment. Compared with the traditional scheme, the scheme has the advantages that the Huffman coding is utilized to realize compression processing on the video data, the transmission efficiency of the video from the cloud end to the mobile device is improved, the sub-graph after the super-resolution processing is displayed on the mobile end, and the requirements on the mobile device of a user and the network environment are reduced.

Drawings

Fig. 1 is a flowchart illustrating a method for processing video super-resolution for a mobile device according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a video super-resolution processing apparatus according to an embodiment of the present invention;

FIG. 3 is an original video frame image of a mobile device according to an embodiment of the present invention;

fig. 4 is a hyper-divided video frame image of a mobile device according to an embodiment of the present invention;

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

According to the method, a video frame image of each frame is obtained and is segmented to form sub-images with equal sizes, edge detection is carried out on the segmented sub-images by using a canny edge detection algorithm, priority ranking is carried out on the sub-images according to the complexity of the edge detection, the higher the complexity of the edge detection is, the higher the priority of the sub-images is, the first K sub-images with the highest priority are transmitted to a cloud server, and the super-resolution processing is carried out on the sub-images based on a deep neural network after the sub-images are received by the cloud server; and coding the sub-images after super-resolution processing by using a huffman coding method, compressing the size of the image and transmitting the image to the mobile equipment, decoding the image after the mobile equipment receives the coded sub-images, splicing the sub-images after super-resolution processing according to the position of an original image to form video frame images after super-resolution processing, and performing the processing on each frame of video image to realize the video super-resolution processing of the mobile equipment. Referring to fig. 1, a schematic diagram of a mobile device-oriented video super-resolution processing method according to an embodiment of the present invention is provided.

In this embodiment, the video super-resolution processing method for a mobile device includes:

s1, obtaining a video frame image of each frame, segmenting the video frame image to form subgraphs with equal size, carrying out edge detection on the segmented subgraphs by using a canny edge detection algorithm, carrying out priority ranking on the subgraphs according to the complexity of the edge detection, wherein the higher the complexity of the edge detection is, the higher the subgraph priority is.

Firstly, the invention acquires the video to be super-resolution processed, acquires the video frame image from the video frame by frame, and divides the video frame image to form the subgraphs with equal size, wherein the size of the subgraph isPixel, in one embodiment of the present invention, M has a value of 112 and N has a value of 56;

further, the invention utilizes a canny edge detection algorithm to carry out edge detection on the sub-images obtained by segmentation, and the edge detection flow based on the canny edge detection algorithm is as follows:

1) carrying out gray level extraction on the subgraph, wherein the gray level extraction process comprises the following steps:

acquiring an RGB color pixel value of each pixel point in the subgraph;

converting the RGB color pixel value of each pixel point into a gray value:

wherein:

is a pixel pointThe gray value of (a);

is a pixel pointThe value of the red color component of (a),is a pixel pointThe green color component of (a) is,is a pixel pointThe blue component of (a);

structure of the deviceFilling the gray value of each pixel point into the gray matrix according to the position of the pixel point;

2) filtering and denoising the gray matrix Q by using a Gaussian filtering method, wherein the filtering and denoising process flow comprises the following steps:

is sized toAnd the standard deviation is 1, and the set Gaussian kernel form is as follows:

multiplying the gray value of each pixel in the gray matrix Q by a Gaussian kernel, and taking the multiplication result as the gray value after filtering and noise reduction; in one embodiment of the invention, the following gray matrix exists for pixel i:

the filtering noise reduction result of the gray value i is:

3) for centering on the grey value of an arbitrary pixel uGray matrixUsing Sobel operatorAndcomputing a gradient matrix of pixels

Wherein:

a gradient matrix representing the pixel u in the x-axis direction;

a gradient matrix representing the pixel u in the y-axis direction;

4) the position of the gray value of the pixel u in the matrix is represented asIndicating that the grey value of pixel u is at the second of the matrix QGo to the firstIs listed inConnecting adjacent gray values as a center, dividing a gray matrix centered on the gray value of the pixel u into 8 regions according toAndpositive and negative size judgment ofIn one embodiment of the invention, ifAndif both are positive values, thenIn the 0-90 degree region, if/Then, thenIn the 0-45 degree region;

computingAndif, ifAre all greater thanAndif the pixel u is an edge pixel, the gray value of the pixel u is retained, otherwise it is set to 0,andthe calculation formula of (2) is as follows:

wherein:

the first of the matrix QGo to the firstA gradient matrix of pixels corresponding to the gray values of the columns;

representing a gradient matrixMultiplying by a gradient matrixThe inverse matrix of (d);

5) performing steps 1) -4) on each gray value in the gray matrix of each subgraph, and calculating the number of edge pixels reserved in each subgraph; the subgraph with the larger number of edge pixels has higher edge detection complexity, and the subgraph with the higher edge detection complexity has higher priority.

And S2, transmitting the first K sub-graphs with the highest priority to a cloud server, and after receiving the sub-graphs, the cloud server performs super-resolution processing on the sub-graphs based on the deep neural network.

Further, the invention transmits the first K sub-graphs with the highest priority to a cloud server, the cloud server performs super-resolution processing on the sub-graphs by using the deep nerves after receiving the sub-graphs, and the super-resolution processing flow of the sub-graphs is as follows:

1) receiving subgraph with low resolution, and extracting low score by using shallow feature extraction moduleShallow feature of resolution subgraph, the shallow feature extraction module is composed of two convolution layers, and in one embodiment of the invention, each convolution layer is composed of two sizesThe convolution kernel of the pixel is formed, and the step length of the convolution layer is 1; the formula for shallow feature extraction is as follows:

wherein:

the time of the low-resolution subgraph is represented by the subgraph t with low resolution, and i represents the ith block subgraph segmented from the video frame image;

a shallow feature extraction module;

representing the extracted shallow features;

2) extracting multi-resolution scale features of the shallow features by using a multi-resolution module:

wherein:

representing an extraction resolution ofA network module for a hierarchical feature of the network,(ii) a Wherein each network moduleEach comprising a convolution kernel of step size 2 for down-sampling 1/2 the input features and a linking module D consisting of n basic convolution unitsThe rear end of the link module is a groupThe convolution layer is composed of convolution kernels of pixels, the basic convolution units are connected to extract multi-level features, the output features of all the previous basic convolution units are accumulated and input to the next basic convolution unit, and the output form of the link module is as follows:

wherein:

representing feature fusion;

representing the last convolutional layer in the link module;

input features representing linked modules;

respectively resolution isA scale characteristic of (a);

3) performing semantic extraction on the scale feature with the resolution of 1/8 by using a link module D to obtain the semantic feature of the subgraph, wherein the semantic extraction formula is as follows:

wherein:

scale features with a resolution of 1/8;

representing the processing of the input features by a link module D;

4) carrying out feature extraction processing on semantic features and scale features by using a full-link module, wherein the full-link module comprises 5 link modules D, and the feature extraction processing formula is as follows:

wherein:

representing the extracted shallow features;

represents the ith link module D in the full link module;

representing extracted multi-scale features and semantic features

5) Features to be output in fully-linked modulesAnd taking the shallow feature as a layering feature G of the final low-resolution subgraph:

6) convolving the layered features into super-resolution subgraphs using sub-pixel convolution layersWherein, in the step (A),low resolution subgraph for representing ith block of video frame image at time tThe sub-pixel convolution layer divides each pixel feature in the hierarchical features intoSmaller pixel features to process the low resolution sub-image as r times the super resolution feature, and in one embodiment of the invention, if r is equal to 3, each pixel feature is segmentedIs oneThe center in the pixel matrix is a pixel characteristic value, and the rest part is supplemented with 0;

7) super-resolution subgraph obtained by convolution by using global low-rank regularization video super-resolution processing methodPerforming super-separation processing, wherein the target function of the global low-rank regularization video super-separation processing is as follows:

wherein:

the super-resolution subgraph is the final super-resolution subgraph after the super-resolution processing;

obtaining a super-resolution subgraph for the layering characteristic convolution;

represents the ith block low resolution sub-picture in the video frame picture at time t,andare respectively asThe previous frame and the next frame;

respectively are control coefficients;

and (3) optimizing and solving the objective function by using an L-BFGS algorithm:

the objective function is converted into:

by means of iteration, obtainApproximation of (1)

Wherein:

i is an identity matrix;

is an identity matrix;

t represents transposition;

as a function of transformationA derivative of (a);

as a function of transformationThe reciprocal of the second derivative;

will be provided withAsAnd calculating to obtainAnd solving by using an iterative method to obtain a final super-resolution subgraph after super-resolution processing, wherein the final super-resolution subgraph is as follows:

wherein:

is composed of

Finally obtaining a super-resolution sub-image sequence of the ith block of the video frame image at different moments

And S3, coding the sub-image after the super-resolution processing by using a huffman coding method, compressing the size of the image and transmitting the image to the mobile equipment.

Further, the invention uses a huffman coding method to code the sub-graph after super-resolution processing, and the flow of the huffman coding method is as follows:

1) acquiring the binary intensity value of each pixel of the sub-image after the super-resolution processing in an RGB color channel through matlab scanning,using the binary color intensity value as huffman coding source, whereinIndicating the subgraph of the ith block low-resolution subgraph of the video frame image at different moments;

2) in a specific embodiment of the invention, a 12-bit binary sequence 001101101001 is scanned and read according to a group of 2 bits, the result is 00, 11, 01, 10 and 01, the obtained source symbols are 00, 11, 01 and 10, and the probabilities are respectively 0.17, 0.17, 0.33 and 0.33; if the scanning reading is carried out according to the group of 3 bits, the obtained results are 001, 101, 101 and 001, the obtained information source symbols have two types of 001 and 101, and the probability is 0.5;

3) respectively distributing a code element 0 and a code element 1 for two information source symbols with the lowest occurrence probability, adding corresponding probabilities of the two information source symbols to serve as new symbol probabilities, and re-participating in sequencing with the rest information source symbols, wherein the larger the weight of the information source symbols is, the more the sequencing is;

4) repeating the operation of the step 3) on the new sequencing result;

5) repeating the above process until all the source symbols are distributed to obtain corresponding code elements;

6) recording the code element content distributed by each information source symbol from the end of the sequencing result to the front step by step to finally obtain the coding code word of each information source symbol;

7) and accumulating the coded code words of each information source symbol, wherein the accumulated result is the huffman coding result of the sub-graph after the super-resolution processing.

Further, the huffman coding result of the sub-graph after the super-resolution processing is sent to the mobile equipment.

And S4, the mobile device decodes the coded sub-images after receiving the coded sub-images, splices the sub-images after super-resolution processing according to the positions of the original images to form video frame images after super-resolution processing, and splices the video frame images into a video according to the sequence of the video frames.

Further, the mobile device decodes the encoded subgraph after receiving the encoded subgraph, the decoding operation is the reverse process of the encoding operation, the binary RGB color intensity value of each pixel in the subgraph after super-resolution processing is obtained, and the pixels are combined into the subgraph after super-resolution processing by utilizing matlab according to the color intensity value of each pixel;

and splicing the sub-images after the super-resolution processing according to the positions of the original images to form video frame images after the super-resolution processing, and splicing the video frame images into a video according to the sequence of the video frames.

The following describes embodiments of the present invention through an algorithmic experiment and tests of the inventive treatment method. The hardware test environment of the algorithm of the invention is as follows: inter (R) core (TM) i7-6700K CPU with software Matlab2018 b; the contrast method is a video super-resolution processing method based on wavelet transformation and a video super-resolution processing method based on CNN.

In the algorithm experiment of the invention, the data set is 10G of low-resolution video. In the experiment, the low-resolution video is input into the algorithm model, and the effectiveness of the video super-resolution processing is used as an evaluation index of algorithm feasibility, wherein the higher the effectiveness of the video super-resolution processing is, the higher the effectiveness and the feasibility of the algorithm are, and the effectiveness of the super-resolution processing is the proportion of the video frame images for realizing the super-resolution processing and the processing efficiency of the super-resolution processing process at a mobile terminal.

According to the experimental result, the video super-resolution processing effectiveness of the wavelet transform-based video super-resolution processing method is 77.62, the video super-resolution processing effectiveness of the CNN-based video super-resolution processing method is 84.12, the video super-resolution processing effectiveness of the method is 89.26, and compared with a comparison algorithm, the mobile device-oriented video super-resolution processing method provided by the invention can realize more effective video super-resolution processing.

The invention also provides a video super-resolution processing device. Referring to fig. 2, there is shown an internal structure diagram of the video super-resolution processing apparatus according to the present embodiment.

In the present embodiment, the video super-resolution processing apparatus 1 includes at least a video frame image acquisition apparatus 11, an image processor 12, a video super-resolution processing apparatus 13, a communication bus 14, and a network interface 15.

The video frame image acquiring apparatus 11 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, a portable Computer, a camera, or the like, or may be a server or the like.

Image processor 12 includes at least one type of readable storage medium including flash memory, a hard disk, a multi-media card, a card-type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The image processor 12 may in some embodiments be an internal storage unit of the video super resolution processing apparatus 1, for example a hard disk of the video super resolution processing apparatus 1. The image processor 12 may also be an external storage device of the super-resolution processing apparatus 1 in other embodiments, such as a plug-in hard disk provided on the super-resolution processing apparatus 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the image processor 12 may also include both an internal storage unit and an external storage device of the video super-resolution processing apparatus 1. The image processor 12 can be used not only to store application software installed in the video super-resolution processing apparatus 1 and various types of data, but also to temporarily store data that has been output or is to be output.

The video super-resolution Processing device 13 may be, in some embodiments, a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip, including a monitoring Unit, for running program codes stored in the image processor 12 or Processing data, such as the video super-resolution Processing program instructions 16.

The communication bus 14 is used to enable connection communication between these components.

The network interface 15 may include a standard wired interface, a wireless interface (such as a WI-FI interface), and is generally used for establishing a communication connection between the video super-resolution processing apparatus 1 and other electronic devices.

Fig. 2 shows only the video super-resolution processing apparatus 1 with the components 11-15, and it will be understood by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the video super-resolution processing apparatus 1, and may include more components than those shown in the drawings, or some components in combination.

In the embodiment of the video super resolution processing apparatus 1 shown in fig. 2, video super resolution processing program instructions 16 are stored in the image processor 12; the steps of the video super-resolution processing apparatus 13 executing the video super-resolution processing program instructions 16 stored in the image processor 12 are the same as the implementation method of the video super-resolution processing method for mobile devices, and are not described herein again.

Furthermore, an embodiment of the present invention also provides a computer-readable storage medium having stored thereon video super-resolution processing program instructions executable by one or more processors to implement the following operations:

acquiring a video frame image of each frame, segmenting the video frame image to form subgraphs with equal sizes, carrying out edge detection on the segmented subgraphs by using a canny edge detection algorithm, and carrying out priority ranking on the subgraphs according to the complexity of the edge detection, wherein the higher the complexity of the edge detection is, the higher the priority of the subgraphs is;

the first K sub-graphs with the highest priority are transmitted to a cloud server, and the cloud server receives the sub-graphs and then carries out super-resolution processing on the sub-graphs based on a deep neural network;

coding the sub-graph after super-resolution processing by using a huffman coding method, compressing the size of the image and transmitting the image to mobile equipment;

and the mobile equipment receives the coded subgraph, decodes the coded subgraph, splices the subgraph subjected to super-resolution processing according to the position of the original image to form a video frame image subjected to super-resolution processing, and splices the video frame image into a video according to the sequence of the video frames.

It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

24页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于虚拟坐席的双向视频方法及系统、设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类