Video coding method and device, electronic equipment and storage medium

文档序号：89995 发布日期：2021-10-08 浏览：29次中文

阅读说明：本技术 视频编码方法、装置、电子设备及存储介质 (Video coding method and device, electronic equipment and storage medium ) 是由刘学青董磊刘宗宗于 2021-06-30 设计创作，主要内容包括：本申请实施例提供了视频编码方法、装置、电子设备及存储介质,对一路同时包括原始帧及隐私保护帧的复合视频流进行编码,原始帧参考隐私保护帧进行编码,而原始帧不作为编码时的参考帧,所以原始帧不需要存储,在前处理中对视频流进行复合和时域分层,在参考关系上为时域分层交织参考减少了参考帧存储所需的内存消耗,有效利用隐私保护帧与原始帧的相关性,原始帧参考对应的隐私保护帧进行编码,大部分参考区域完全相同,有利于大幅度减少编码的码流数量,降低码流存储和传输成本。(The embodiment of the application provides a video coding method, a device, electronic equipment and a storage medium, a composite video stream which simultaneously comprises an original frame and a privacy protection frame is coded, the original frame refers to the privacy protection frame for coding, and the original frame is not used as a reference frame during coding, so that the original frame does not need to be stored, the video stream is compounded and time-domain layered in preprocessing, the memory consumption required by reference frame storage is reduced for time-domain layered interleaving reference on a reference relation, the correlation between the privacy protection frame and the original frame is effectively utilized, the privacy protection frame corresponding to the original frame is referred to for coding, most of reference areas are completely the same, the number of coded code streams is favorably and greatly reduced, and the storage and transmission cost of the code streams is reduced.)

1. A method of video encoding, the method comprising:

acquiring a video stream to be processed;

processing and configuring each video frame in the video stream to be processed into a privacy protection frame and an original frame to obtain a composite video stream;

for each video frame to be coded in the composite video stream, if the video frame is a privacy protection frame, performing privacy protection on the video frame by using a preset privacy protection rule, and setting an inter-frame coding reference mode of the video frame as referable, and if the video frame is an original frame, setting the inter-frame coding reference mode of the video frame as unrereferable;

coding each video frame based on an inter-frame coding reference mode of each video frame in the composite video stream to obtain a coded video data stream, wherein the inter-frame coding reference mode is a referable video frame and is allowed to be used as a reference frame for coding other video frames; inter-coded reference modes are video frames that are not referable and do not allow reference frames for encoding as other video frames.

2. The method of claim 1, wherein the processing the video frame into a privacy-preserving frame and an original frame for each video frame in the video stream to be processed to obtain a composite video stream comprises:

for each video frame in the video stream to be processed, obtaining a first video frame and a second video frame which have the same content as the video frame based on the video frame, setting the frame type attribute of the first video frame as a privacy protection frame type to obtain a privacy protection frame, and setting the frame type attribute of the second video frame as an original frame type to obtain an original frame;

and interweaving the privacy protection frame and the original frame corresponding to each video frame in the video stream to be processed into a composite video stream.

3. The method of claim 2, further comprising:

setting the time domain reference model of the composite video stream coding as a time domain scalable reference model supporting code stream framing;

if the video frame is a privacy protection frame, performing privacy protection on the video frame by using a preset privacy protection rule, setting the inter-frame coding reference mode of the video frame as referable, and if the video frame is an original frame, setting the inter-frame coding reference mode of the video frame as unrereferable, including:

if the video frame is a frame privacy protection frame, performing privacy protection on the video frame by using a preset privacy protection rule, and setting the video frame as a time domain base layer; and if the video frame is the original frame, setting the video frame as a temporal enhancement layer.

4. The method of claim 1, wherein the processing the video frame into a privacy-preserving frame and an original frame for each video frame in the video stream to be processed to obtain a composite video stream comprises:

for each video frame in the video stream to be processed, obtaining an M +1 frame video frame with the same content as the video frame based on the video frame, setting the frame type attribute of the M frame video frame as a privacy protection frame type to obtain an M privacy protection frame, and setting the frame type attribute of the remaining frame video frame as an original frame type to obtain an original frame, wherein M is the number of the privacy protection types;

and interweaving the privacy protection frame and the original frame corresponding to each video frame in the video stream to be processed into a composite video stream.

5. The method of claim 4, further comprising:

setting the time domain reference model of the composite video stream coding as a time domain scalable reference model supporting code stream framing;

if the video frame is an Mth frame privacy protection frame obtained from the same video frame in the video stream to be processed, carrying out privacy protection on the video frame by using a preset Mth privacy protection rule, and setting the video frame as a time domain basic layer; if the video frame is an ith frame privacy protection frame obtained from the same video frame in the video stream to be processed, carrying out privacy protection on the video frame by using a preset ith privacy protection rule, and setting the video frame as a time domain basic layer or an M-i level time domain enhancement layer; if the video frame is an original frame, setting the video frame as a highest-level time-domain enhancement layer, where i is a positive integer smaller than M, the high-level time-domain enhancement layer may not be used as an inter-frame-coded reference frame of the low-level time-domain enhancement layer, the low-level time-domain enhancement layer may be used as an inter-frame-coded reference frame of the high-level time-domain enhancement layer, and the time-domain base layer may be used as an inter-frame-coded reference frame of each level of the time-domain enhancement layer.

6. The method of claim 1, further comprising:

reading data with a preset unit data amount from the coded video data stream every time, encrypting the currently read data and packaging the encrypted data if the data is data of an original frame aiming at the currently read data, and directly packaging the data if the data is data of a privacy protection frame so as to obtain a data stream to be sent, wherein the data stream to be sent comprises packaged data of the original frame and packaged data of the privacy protection frame;

and respectively sending the data stream to be sent to a common authority user and a high-level authority user.

7. The method of claim 1, further comprising:

reading data with preset unit data amount from the coded video data stream every time, and obtaining a first code stream and a second code stream which have the same content with the currently read data based on the currently read data;

if the current first code stream is data of an original frame, encrypting the current first code stream, and packaging the encrypted data, and if the current first code stream is data of a privacy protection frame, directly packaging the current first code stream to obtain a data stream to be sent, wherein the data stream to be sent comprises packaged data of the original frame and packaged data of the privacy protection frame;

sending the data stream to be sent to a high-level authority user;

if the current second code stream is data of an original frame, discarding the data, and if the current second code stream is data of a privacy protection frame, encapsulating the current second code stream to obtain a privacy protection data stream, wherein the privacy protection data stream comprises encapsulated data of the privacy protection frame;

and sending the privacy protection data stream to a common authority user.

8. A video encoding apparatus, characterized in that the apparatus comprises:

the video stream acquisition module is used for acquiring a video stream to be processed;

a video frame configuration module, configured to process and configure each video frame in the to-be-processed video stream into a privacy protection frame and an original frame, so as to obtain a composite video stream;

a reference mode setting module, configured to, for each to-be-encoded video frame in the composite video stream, perform privacy protection on the video frame by using a preset privacy protection rule if the video frame is a privacy protection frame, set an inter-frame coding reference mode of the video frame to be referable, and set the inter-frame coding reference mode of the video frame to be unrereferable if the video frame is an original frame;

the video frame coding module is used for coding each video frame based on an interframe coding reference mode of each video frame in the composite video stream to obtain a coded video data stream, wherein the interframe coding reference mode is a referable video frame and is allowed to be used as a reference frame for coding other video frames; inter-coded reference modes are video frames that are not referable and do not allow reference frames for encoding as other video frames.

9. The apparatus of claim 8, wherein the video frame configuration module is specifically configured to:

and interweaving the privacy protection frame and the original frame corresponding to each video frame in the video stream to be processed into a composite video stream.

10. The apparatus according to claim 9, wherein the reference mode setting module is specifically configured to:

setting the time domain reference model of the composite video stream coding as a time domain scalable reference model supporting code stream framing;

aiming at each video frame to be coded in the composite video stream, if the video frame is a frame privacy protection frame, carrying out privacy protection on the video frame by using a preset privacy protection rule, and setting the video frame as a time domain base layer; and if the video frame is the original frame, setting the video frame as a temporal enhancement layer.

11. The apparatus of claim 8, wherein the video frame configuration module is specifically configured to:

and interweaving the privacy protection frame and the original frame corresponding to each video frame in the video stream to be processed into a composite video stream.

12. The apparatus according to claim 11, wherein the reference mode setting module is specifically configured to:

setting the time domain reference model of the composite video stream coding as a time domain scalable reference model supporting code stream framing;

for each video frame to be coded in the composite video stream, if the video frame is an Mth frame privacy protection frame obtained from the same video frame in the video stream to be processed, performing privacy protection on the video frame by using a preset Mth privacy protection rule, and setting the video frame as a time domain base layer; if the video frame is an ith frame privacy protection frame obtained from the same video frame in the video stream to be processed, carrying out privacy protection on the video frame by using a preset ith privacy protection rule, and setting the video frame as a time domain basic layer or an M-i level time domain enhancement layer; if the video frame is an original frame, setting the video frame as a highest-level time-domain enhancement layer, where i is a positive integer smaller than M, the high-level time-domain enhancement layer may not be used as an inter-frame-coded reference frame of the low-level time-domain enhancement layer, the low-level time-domain enhancement layer may be used as an inter-frame-coded reference frame of the high-level time-domain enhancement layer, and the time-domain base layer may be used as an inter-frame-coded reference frame of each level of the time-domain enhancement layer.

13. The apparatus of claim 8, further comprising a video data transmission module configured to:

and respectively sending the data stream to be sent to a common authority user and a high-level authority user.

14. The apparatus of claim 8, further comprising a video data transmission module configured to:

sending the data stream to be sent to a high-level authority user;

and sending the privacy protection data stream to a common authority user.

15. A method of video decoding, the method comprising:

obtaining a video stream to be decoded, wherein the video stream to be decoded comprises: packaging data of an original frame and packaging data of a privacy protection frame;

and decoding the encapsulated data of the privacy protection frame in the video stream to be decoded to obtain the privacy protection frame under the condition that a decoding key for the original frame is not acquired.

16. An electronic device comprising a processor and a memory;

the memory is used for storing computer programs and data generated in the encoding process;

the processor, when executing the program stored in the memory, implements the method of any of claims 1-7, 15.

17. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 7, 15.

Technical Field

The present application relates to the field of video coding technologies, and in particular, to a video coding method and apparatus, an electronic device, and a storage medium.

Background

With the development of scientific technology and the improvement of safety consciousness of people, monitoring equipment in public places is more and more, and intelligent detection technology is more and more applied to monitoring scenes. In addition, people pay more and more attention to personal privacy protection under monitoring equipment, and the demand of video privacy protection technology is higher and higher. The video privacy protection is that the image of the privacy area is not recognizable by processing such as coding, blocking or scrambling the local privacy area in the video image, and other areas keep the original content to be normally visible. However, image processing for protecting video privacy has the characteristic that protected contents are not recoverable, and when public safety hazards occur, if the original contents cannot be recovered by a monitoring video, the evidence obtaining work is greatly hindered.

In order to solve the problems of privacy protection and evidence collection, in the related technology, collected video streams are processed into two paths of same video streams, one path of video stream is not subjected to privacy processing, an original video frame is reserved for coding, and the other path of video stream is subjected to privacy protection processing and codes the video frame after privacy protection. However, by using the above method, two paths of video streams are respectively encoded, which may cause a problem of large resource consumption in the encoding process, and also causes a large consumption of network bandwidth resources in the data transmission process.

Disclosure of Invention

An object of the embodiments of the present application is to provide a video encoding method, an apparatus, an electronic device, and a storage medium, so as to reduce resource consumption in an encoding process. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a video encoding method, where the method includes:

acquiring a video stream to be processed;

processing and configuring each video frame in the video stream to be processed into a privacy protection frame and an original frame to obtain a composite video stream;

In a possible embodiment, for each video frame in the video stream to be processed, configuring the video frame processing as a privacy-preserving frame and an original frame to obtain a composite video stream, including:

and interweaving the privacy protection frame and the original frame corresponding to each video frame in the video stream to be processed into a composite video stream.

In one possible embodiment, the method further comprises:

setting the time domain reference model of the composite video stream coding as a time domain scalable reference model supporting code stream framing;

and interweaving the privacy protection frame and the original frame corresponding to each video frame in the video stream to be processed into a composite video stream.

In one possible embodiment, the method further comprises:

setting the time domain reference model of the composite video stream coding as a time domain scalable reference model supporting code stream framing;

In one possible embodiment, the method further comprises:

and respectively sending the data stream to be sent to a common authority user and a high-level authority user.

In one possible embodiment, the method further comprises:

sending the data stream to be sent to a high-level authority user;

and sending the privacy protection data stream to a common authority user.

In a second aspect, an embodiment of the present application provides a video encoding apparatus, including:

the video stream acquisition module is used for acquiring a video stream to be processed;

In a possible implementation manner, the video frame configuration module is specifically configured to:

and interweaving the privacy protection frame and the original frame corresponding to each video frame in the video stream to be processed into a composite video stream.

In a possible implementation manner, the reference mode setting module is specifically configured to:

setting the time domain reference model of the composite video stream coding as a time domain scalable reference model supporting code stream framing;

In a possible implementation manner, the video frame configuration module is specifically configured to:

and interweaving the privacy protection frame and the original frame corresponding to each video frame in the video stream to be processed into a composite video stream.

In a possible implementation manner, the reference mode setting module is specifically configured to:

setting the time domain reference model of the composite video stream coding as a time domain scalable reference model supporting code stream framing;

In a possible implementation, the apparatus further includes a video data sending module configured to:

and respectively sending the data stream to be sent to a common authority user and a high-level authority user.

In a possible implementation, the apparatus further includes a video data sending module configured to:

sending the data stream to be sent to a high-level authority user;

and sending the privacy protection data stream to a common authority user.

In a third aspect, an embodiment of the present application provides a video decoding method, where the method includes:

obtaining a video stream to be decoded, wherein the video stream to be decoded comprises: packaging data of an original frame and packaging data of a privacy protection frame;

under the condition that a decoding key for an original frame is acquired, decoding encapsulated data of the original frame in the video stream to be decoded by using the decoding key to obtain the original frame; decoding the encapsulated data of the privacy protection frame in the video stream to be decoded to obtain the privacy protection frame;

In a fourth aspect, an embodiment of the present application provides an electronic device, including a processor and a memory;

the memory is used for storing computer programs and data generated in the encoding process;

the processor is configured to implement the video encoding method or the video decoding method according to any one of the present applications when executing the program stored in the memory.

In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements a video encoding method or a video decoding method as described in any of the present application.

The embodiment of the application has the following beneficial effects:

the video coding method, the video coding device, the electronic equipment and the storage medium provided by the embodiment of the application acquire a video stream to be processed; processing and configuring each video frame in a video stream to be processed into a privacy protection frame and an original frame to obtain a composite video stream; aiming at each video frame to be coded in the composite video stream, if the video frame is a privacy protection frame, carrying out privacy protection on the video frame by using a preset privacy protection rule, setting an inter-frame coding reference mode of the video frame as referable, and if the video frame is an original frame, setting the inter-frame coding reference mode of the video frame as unrereferable; coding each video frame based on an interframe coding reference mode of each video frame in the composite video stream to obtain a coded video data stream, wherein the interframe coding reference mode is a referable video frame and is allowed to be used as a reference frame for coding other video frames; inter-coded reference modes are video frames that are not referable and do not allow reference frames for encoding as other video frames.

The method comprises the steps of coding a path of composite video stream simultaneously comprising an original frame and a privacy protection frame, coding the original frame by referring to the privacy protection frame, and coding the original frame not serving as a reference frame during coding, so that the original frame does not need to be stored. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a schematic diagram of a video encoding method according to an embodiment of the present application;

FIG. 2 is a diagram illustrating a structure of a code stream in the related art;

FIG. 3 is a diagram of a related art SVC-TX2 model and SVC-TX4 model inter-frame coding timing reference relationship;

fig. 4 is a first schematic diagram illustrating a video stream transmission process in a video encoding method according to an embodiment of the present application;

fig. 5 is a second schematic diagram illustrating a video stream transmission process in a video encoding method according to an embodiment of the present application;

FIG. 6a is a schematic diagram of an electronic device according to an embodiment of the present application;

FIG. 6b is a schematic structural diagram of an encoding end and a decoding end according to an embodiment of the present application;

FIG. 6c is a schematic structural diagram of an encoding end and a decoding end according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a flowchart of a pre-coding processing unit according to an embodiment of the present application;

FIG. 8 is a first schematic diagram of a workflow diagram of an encoding post-processing unit according to an embodiment of the present application;

FIG. 9 is a second schematic diagram of a working flow chart of an encoding post-processing unit according to an embodiment of the present application;

fig. 10 is another schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the description herein are intended to be within the scope of the present disclosure.

In order to reduce resource consumption in the encoding process, an embodiment of the present application provides a video encoding method, which, referring to fig. 1, includes:

s101, obtaining a video stream to be processed.

The video encoding method in the embodiment of the present application may be implemented by an electronic device with an encoding function, and specifically, the electronic device may be a video camera or a hard disk video recorder, and the like. The video stream to be processed is any video stream needing to be transmitted.

S102, aiming at each video frame in the video stream to be processed, processing and configuring the video frame into a privacy protection frame and an original frame to obtain a composite video stream.

The image contents of the privacy protection frame and the original frame are the same as the image contents of the video frame, and the privacy protection frame and the original frame are different in type identification.

In one example, each video frame in the video stream to be processed may be processed and configured twice, interleaved into a composite video stream, for example, the first time processed into a privacy-preserving frame and the second time into an original frame, and also interleaved with each other in a reference system at the time of information configuration.

In other possible embodiments, each video frame in the video stream to be processed may be processed and configured multiple times to be interleaved into a composite video stream, for example, a privacy-preserving frame for protecting a first object is processed for the first time, a privacy-preserving frame for protecting a second object is processed for the second time, … …, and the original frame is processed for the last time, which are also interleaved with each other in the reference relation. In one example, the first object and the second object are different types of objects, for example, the first object is a human face, the second object is a license plate number, and the like.

In an example, the types of the privacy protection frame and the original frame may be in a header of a video frame in a recording process, and in a possible implementation, for example, for each video frame in the video stream to be processed, the processing and configuring the video frame into the privacy protection frame and the original frame to obtain a composite video stream includes:

step one, aiming at each video frame in the video stream to be processed, obtaining a first video frame and a second video frame which have the same content with the video frame based on the video frame, setting the frame type attribute of the first video frame as a privacy protection frame type to obtain a privacy protection frame, and setting the frame type attribute of the second video frame as an original frame type to obtain an original frame;

and step two, interweaving privacy protection frames and original frames corresponding to all video frames in the video stream to be processed into a composite video stream.

Frame type attributes include, but are not limited to, NALU type, temporal priority, other protocol specified customizable fields, and the like. Taking the code stream of the h.265 standard as an example, the code stream structure can be as shown in fig. 2, and in the first layer, each Frame represents the code stream encoded by each Frame. In the second layer, a frame may be divided into one or more slices, and the code stream of each slice is an NAL unit, so that a frame of code stream may be composed of one or more NAL units. In the third layer, the Nal unit consists of NALU header and RBSP. The NALU header mainly carries the content characteristics of the current NAL unit, and the RBSP is the NALU payload and contains the main video compression data (slice header + slice data). In the fourth Layer, the NALU header consists of a reserved bit (F), a NALU type (NALU _ type), a Layer level (NLI) and a time domain level (NTI), and generally, the NALU header can directly and quickly judge the type of a frame to which a current segment code stream belongs from the NALU type to perform operations of frame extraction and the like of the time domain level; the lowest layer belongs to the specific values and numerical meanings of NALU types. For the h.265 standard code stream, the privacy protection frame type or the original frame type may be recorded in the NALU type.

In one example, various types of privacy protection are required, for example, privacy protection of face information is required for a user a, and privacy protection of license plate information is not required; and aiming at the privacy protection of the face information and the license plate information required by the user B, two types of privacy protection frames are required to be generated.

step one, aiming at each video frame in the video stream to be processed, obtaining an M +1 frame video frame with the same content as the video frame based on the video frame, setting the frame type attribute of the M frame video frame as a privacy protection frame type to obtain an M privacy protection frame, and setting the frame type attribute of the remaining frame video frame as an original frame type to obtain an original frame, wherein M is the number of the privacy protection types;

and step two, interweaving privacy protection frames and original frames corresponding to all video frames in the video stream to be processed into a composite video stream.

S103, aiming at each video frame to be coded in the composite video stream, if the video frame is a privacy protection frame, the video frame is subjected to privacy protection by using a preset privacy protection rule, an interframe coding reference mode of the video frame is set to be referable, and if the video frame is an original frame, the interframe coding reference mode of the video frame is set to be unrereferable.

The preset privacy protection rule can be set according to actual conditions in a user-defined mode, privacy protection is conducted on the privacy protection frame, sources of privacy protection area coordinates include but are not limited to direct configuration of users, automatic detection of intelligent algorithms and the like, and the number of privacy protection areas in each frame of privacy protection frame is not limited to 1. The privacy processing method for the privacy protection area includes but is not limited to scrambling, or blocking.

For the case where there is only one type of privacy-preserving video stream, in one possible embodiment, the method further comprises:

setting the time domain reference model of the composite video stream coding as a time domain scalable reference model supporting code stream framing;

The time-domain base layer can be used as an inter-frame coded reference frame of the time-domain enhancement layer, and the time-domain enhancement layer can not be used as an inter-frame coded reference frame of the time-domain base layer.

For the case where there are multiple types of privacy-preserving video streams, in one possible embodiment, the method further comprises: setting the time domain reference model of the composite video stream coding as a time domain scalable reference model supporting code stream framing;

In the embodiment of the application, for each video frame in the video stream to be processed, an M +1 frame video frame with the same content as the video frame is obtained based on the video frame, so as to obtain a composite video stream. The time domain reference model adopted by the coding of each video frame (including the privacy protection frame and the original frame) in the composite video stream is a time domain scalable reference model supporting the code stream frame extraction, and during time domain scaling, the M frame privacy protection frame obtained from the same video frame in the video stream to be processed is a time domain basic layer which can be referred by other high-level frames; the M-1 frame privacy protection frame is a first-level time domain enhancement layer and can be referred by other high-level frames; the M-2 frame privacy protection frame is a second-level time domain enhancement layer and can be referred to by other high-level frames; … …, respectively; the original frame is a highest-level time domain enhancement layer and cannot be referred by other frames; furthermore, the temporal enhancement layer may not be an inter-coded reference frame for the temporal base layer. The time domain reference model in the embodiment of the present application includes, but is not limited to, SVC-TX2, SVC-TX4, etc.

In one example, the setting of the inter-coding reference mode of the video frame may be implemented by configuring the basic register parameter and the reference relation register parameter. For example, the basic register parameters may be configured generally for the privacy protection frame, and when the reference relation register parameters are configured, a temporal scalable reference model supporting bitstream framing, such as an SVC-TX2 model or an SVC-TX4 model, is selected, and then the currently processed privacy protection frame is set as a temporal basic layer or an intermediate enhancement layer that can be referred to. The schematic diagrams of the SVC-TX2 model and the SVC-TX4 model may be that the enhancement layer needs to refer to the base layer for coding and decoding as shown in fig. 3, the SVC-TX2 model may refer to forward or backward, and the SVC-TX4 model may refer to forward or backward. And aiming at the original frame, the image content of the original frame is not subjected to privacy processing, a basic register is subjected to general configuration when a register parameter is configured, a time domain reference model which is the same as the privacy protection frame is selected when a reference relation register parameter is configured, and then the original frame is set as a time domain enhancement layer which cannot be referred to.

S104, coding each video frame based on the interframe coding reference mode of each video frame in the composite video stream to obtain a coded video data stream, wherein the interframe coding reference mode is a referable video frame and is allowed to be used as a reference frame for coding other video frames; inter-coded reference modes are video frames that are not referable and do not allow reference frames for encoding as other video frames.

And sequentially carrying out video coding on each video frame in the composite video stream by adopting a video coding standard to generate a compressed code stream, wherein the video coding standard comprises but is not limited to H.264, H.265, AVS2 and the like. The inter-coding reference mode of the original frame is not referable and does not allow reference frames for encoding as other video frames. I.e. the original frame is the enhancement layer. The inter-frame coding reference mode of the privacy protection frame is referable, and allows a reference frame for coding as other video frames, that is, the privacy protection frame may be a base layer or an enhancement layer.

In one example, as shown in fig. 6b, steps S101-S103 may be implemented by an encoding preprocessing unit, and step S104 may be implemented by an encoder. Specifically, the pre-encoding processing unit may be implemented by a processor based on software logic, and the encoder may be implemented by encoder hardware in the related art.

In the embodiment of the application, a composite video stream which simultaneously comprises an original frame and a privacy protection frame is encoded, the original frame can refer to the privacy protection frame which is used as an I frame and also can refer to the privacy protection frame which is used as a P frame or a B frame, the correlation between the privacy protection frame and the original frame is effectively utilized, the resource consumption in the encoding process can be reduced, the data volume of the video data stream obtained by encoding is less, and therefore the network bandwidth resource consumption is reduced.

After obtaining the encoded video data stream, the encoded video data stream also needs to be sent in a package, and in a possible implementation, referring to fig. 4, the method further includes:

s201, reading data of a preset unit data amount from the encoded video data stream each time, and for the currently read data, if the data is data of an original frame, encrypting the currently read data, and encapsulating the encrypted data, and if the data is data of a privacy protection frame, directly encapsulating the data, thereby obtaining a data stream to be transmitted, where the data stream to be transmitted includes encapsulated data of the original frame and encapsulated data of the privacy protection frame;

s202, the data stream to be sent is sent to the ordinary authority user and the advanced authority user respectively.

The client of the ordinary authority user has authority limit, and only the privacy protection frame can be decoded and displayed, but the original frame cannot be displayed.

In one example, as shown in FIG. 6b, steps S201-S202 may be implemented by a post-encoding processing unit. In particular, the post-encoding processing unit may be implemented by a processor based on software logic.

Reading the encoded video data stream output by the video encoder, and reading the code stream data of the preset unit data amount each time, wherein the granularity of the preset unit data amount can be set by self according to the actual situation, including but not limited to Byte level, NALU level or frame level. In one example, header information of current data may be parsed to obtain a type of a frame to which the current data belongs, for example, in h.264 or h.265 standards, whether the current data is original frame data or privacy protection frame data is determined according to a NALU type or a time domain level in NALU header information. If the data is original frame data, the original frame code stream is encrypted independently, the encryption mode includes but is not limited to AES encryption and the like, and the encrypted data is encapsulated in an application layer. If the data is privacy protection frame data, directly performing application layer encapsulation; in one example, global encryption is selectively performed, and then the global encryption is packaged and output to a decoding end.

In one possible embodiment, referring to fig. 5, the method further comprises:

s301, reading data with preset unit data amount from the coded video data stream every time, and obtaining a first code stream and a second code stream with the same content as the currently read data based on the currently read data;

s302, if the current first code stream is data of an original frame, encrypting the current first code stream, and packaging the encrypted data, and if the current first code stream is data of a privacy protection frame, directly packaging the current first code stream, so as to obtain a data stream to be sent, wherein the data stream to be sent comprises packaging data of the original frame and packaging data of the privacy protection frame;

s303, sending the data stream to be sent to a high-level authority user;

s304, if the current second code stream is data of an original frame, discarding the data, and if the current second code stream is data of a privacy protection frame, encapsulating the current second code stream to obtain a privacy protection data stream, wherein the privacy protection data stream comprises encapsulated data of the privacy protection frame;

s305, sending the privacy protection data stream to a common authority user.

In one example, as shown in FIG. 6c, steps S301-S305 may be implemented by a post-encoding processing unit. In particular, the post-encoding processing unit may be implemented by a processor based on software logic.

In the related art, the frame extraction operation is to adapt to different bandwidth situations, for example, when the network bandwidth is smooth, frame extraction is not performed, a video stream of 60FPS (frames per second) is used, when the network bandwidth is congested, 1/2 video frames are extracted, namely a video stream of 30FPS is used, when the network bandwidth is highly congested, 1/4 video frames are extracted, namely a video stream of 15FPS is used, so that normal playing of the video is ensured at the expense of frames per second.

In the embodiment of the application, the data of the privacy protection frame is extracted by using a frame extraction mode and is packaged into the privacy protection data stream, and the privacy protection data stream received by the ordinary authority user only contains the data of the privacy protection frame, so that the ordinary authority user can be prevented from acquiring the original frame, the privacy safety is ensured, and meanwhile, the bandwidth of the ordinary authority user is saved.

An embodiment of the present application further provides a video decoding method, where the method includes:

step one, obtaining a video stream to be decoded, wherein the video stream to be decoded comprises: the encapsulation data of the original frame and the encapsulation data of the privacy protection frame.

The video stream to be decoded may be a data stream to be transmitted sent by the encoding end in the above embodiment.

And secondly, decoding the encapsulated data of the original frame in the video stream to be decoded by using the decoding key under the condition of acquiring the decoding key aiming at the original frame to obtain the original frame.

The high-level authority user inputs a decoding key aiming at the original frame to a decoding end, and the decoding end decodes the encapsulated data of the original frame in the video stream to be decoded by using the decoding key under the condition of acquiring the decoding key aiming at the original frame to obtain the original frame.

And thirdly, decoding the encapsulated data of the privacy protection frame in the video stream to be decoded to obtain the privacy protection frame under the condition that the decoding key for the original frame is not acquired.

The normal authority user does not know the decoding key, so that only the privacy protection frame can be decoded.

The embodiment of the application also provides electronic equipment, which realizes the privacy protection of video coding through the process of 'encoding and interweaving a pre-processor into a composite sequence- > encoding only one sequence by an encoder- > encoding and frame-extracting and encrypting a post-processor code stream', and simultaneously supports the recovery of the original video. For example, as shown in fig. 6a, the video encoder includes a pre-coding processing unit, a video encoder, and a post-coding processing unit.

An encoding preprocessing unit: processing and allocating each frame of video frames to be output at least twice to obtain a frame of original frames and at least one frame of privacy protection frames, and interleaving the frames into a composite video stream. Wherein the privacy-preserving frame is set to allow coding references for other video frames, and the original frame is set to disallow coding references for other video frames. In one example, taking two times as an example, the first time is processed into a privacy protection frame, the second time is an original frame, and the frames are also interleaved with each other in a reference system during information configuration. In one example, each frame processing configuration may be interleaved into a composite video stream a plurality of times, the first time processing is to protect the privacy-preserving frames of the first object, the second time processing is to protect the privacy-preserving frames of the second object, … …, and the last time processing is to be the original frames, which are also interleaved with each other in the reference relation.

A video encoder: the standard video encoder is realized by a software encoder and a hardware encoder, and the supported standard protocols include but are not limited to H.264, H.265, H.266, AVS2, AVS3, AV1, VP9 and the like.

An encoding post-processing unit: the traditional usage of the code stream frame extraction component in the unit is that frame extraction is performed when the network bandwidth is in tension, the code stream frame extraction component is used for extracting frames and then judging the video frame type of the code stream, privacy protection encryption operation is performed again if the original frame code stream is the original frame code stream, the encryption mode includes but is not limited to AES encryption and the like, and privacy protection encryption is not required or another set of secret key is used for encryption if the original frame code stream is the privacy protection frame code stream. After the privacy protection encryption is carried out on the code stream, the primary global encryption can be selectively carried out, so that the data stream to be sent is obtained, the security of the code stream in the transmission process is ensured, and then the data stream to be sent is output.

The functions of the units are specifically described below:

the basic function of the encoding pre-processing unit is to transmit register parameters and image YUV data for an encoder according to encoding attributes configured by a user and an acquired input video sequence. The coding pre-processing unit of the patent also needs to process and configure input frames for multiple times, interweaves the input frames into a composite video stream of privacy protection frames and original frames, and configures register parameters into a reference relation capable of extracting frames in a time domain. The processing flow of the encoding pre-processing is shown in fig. 7, and the specific processing flow is described as follows:

step 1, reading the encoding attribute configured by the user from the user interface, and converting the abstract encoding attribute into a concrete parameter value.

And 2, acquiring an original frame image of a frame as an input frame aiming at the video stream to be processed.

And 3, if the input frame is a privacy protection frame, executing the step 4, otherwise, jumping to the step 6.

And 4, carrying out privacy protection on the input frame, wherein the sources of the input privacy protection area coordinates include but are not limited to direct configuration, intelligent detection and the like of a user, and the number of local areas is not limited to 1. Performing privacy processing on a local area corresponding to the original image according to the input coordinates, wherein the privacy processing mode includes but is not limited to scrambling, coding or shielding and the like;

and 5, after privacy protection, configuring the basic register parameters and the reference relation register parameters of the privacy protection frame. The basic register parameters are generally configured, when the reference relation register parameters are configured, a temporal scalable reference model supporting bitstream framing is selected, including but not limited to the SVC-TX2 model and the SVC-TX4 model listed in fig. 3, and then the currently processed privacy protection frame is set as a temporal basic layer or an intermediate enhancement layer that can be referred to, and the process jumps to step 7.

And 6, not carrying out privacy protection on the input frame, not carrying out any processing on the content of the original image, carrying out general configuration on a basic register when configuring the register parameters, selecting the same time domain reference model when configuring the reference relation register parameters, and then setting the currently processed original frame as a time domain enhancement layer which can not be referred to.

And 7, outputting the configured register parameters to a video encoder, and outputting the processed image content data, wherein the image content data format includes but is not limited to YUV420, YUV422 and other formats.

And 8, if the current input frame also needs to be subjected to other types of privacy protection, jumping to the step 3, and otherwise, ending the processing of the current input frame.

In the embodiment of the application, in the composite video stream, the privacy protection frame is skipped to refer to other privacy protection frames, and the original frame refers to the adjacent privacy protection frame.

The video encoder has the function of sequentially carrying out video encoding on each frame of image in the composite video stream sequence by adopting a video encoding standard to generate a compressed code stream. The standards supported by the video encoder include, but are not limited to, h.264, h.265, AVS2, and the like.

When each video frame in the composite video stream is encoded, the encoder works according to a general flow, and the steps are as follows:

step 1, reading the configuration information of the encoder register.

Step 2, if the current frame is an I frame, only reading the image data of the current frame without a reference frame; otherwise, reading the image data of the current frame, and simultaneously reading the required reference frame data according to the time domain reference relation.

And 3, performing intra-frame inter-frame prediction, transformation quantization, loop filtering, entropy coding and the like according to the coding standard.

And 4, outputting the compressed code stream to be processed after coding, and outputting a reconstructed image for reference of a subsequent frame.

The function of the processing unit after encoding is to perform application level encapsulation on the bare stream output by the video encoder, protect and encrypt the original frame code stream, and finally output one path of code stream to the decoding end.

The processing flow of the post-coding processing unit can be as shown in fig. 8, and includes:

step 1, reading a coded video data stream output by a video coder; the granularity of each read code stream includes but is not limited to Byte level, NALU level or frame level.

And 2, analyzing the code stream header information of the currently read data to acquire the type of the frame to which the current code stream belongs. Such as the NALU type or time domain hierarchy in the NALU header information in the h.264 and h.265 standards.

Step 3, judging the type of the current code stream data, and if the type of the current code stream data is the original frame code stream data, executing step 4; otherwise, the data is the privacy protection frame code stream data, and the step 5 is skipped.

And 4, independently encrypting the original frame code stream data in a mode including but not limited to AES encryption and the like.

And 5, performing application layer packaging on the current code stream data, optionally performing global encryption, and then packaging and outputting to a decoding end.

And 6, finishing the processing of the current code stream data.

For example, as shown in fig. 6b, the electronic device is an encoding side device, and the rights management is implemented by a decoding side. And the decoding end comprises a right management unit, an advanced user decoding unit, a common user decoding unit and a display. The authority management unit judges the authority of the current user according to whether the privacy protection key is input or not, the advanced authority user with the key decodes the composite video stream formed by interweaving the privacy protection frame and the original frame after decryption, then extracts the privacy protection frame, and only displays the original frame; and the ordinary authority user without the key only decodes the privacy protection frame video stream and directly displays the privacy protection frame video stream. In an extension, the right management unit of the decoding end can be further divided into decoding right management and playing right management.

In a possible implementation manner, the post-encoding processing unit may further send the code stream output by the video encoder in two paths, where one path performs frame extraction on the code stream and then packs the frame to output to the ordinary right user, and the other path packs the frame to output to the advanced right user. The code stream frame extraction operation is generally performed under the condition of network bandwidth shortage, and the code stream frame extraction performed in the application is to extract the original frame of the enhancement layer to generate the privacy protection data stream, which is an innovative use of the code stream frame extraction component in the application. The processing flow may be as shown in fig. 9, and includes:

step 1, reading a coded video data stream output by a video coder; the granularity of each read code stream includes but is not limited to Byte level, NALU level or frame level.

And 3, dividing the current code stream into two paths for processing, wherein one path is connected with a common authority user, executing the step 4, the other path is connected with a high-level authority user, and jumping to the step 6.

And 4, butting the common authority user, executing the step 5 if the current code stream is the privacy protection frame code stream according to the frame type of the code stream, otherwise, performing frame extraction operation without sending the original frame code stream, and directly jumping to the step 7.

And 5, performing application layer packaging on the privacy protection frame code stream, packaging and outputting the privacy protection frame code stream to a common authority user, and jumping to the step 7.

And 6, butting the advanced authority users, directly packaging the code stream by an application layer, and packaging and outputting the code stream to the advanced authority users.

And 7, finishing the processing of the current code stream.

In the embodiment of the application, the privacy protection and the recoverability are realized based on the temporal layering technology of video coding, rather than separately coding the original frame and the privacy protection frame according to the traditional scheme. When time domain layering is carried out, the original frame refers to the same-position privacy protection frame, and most of reference areas are completely the same, so that the number of coded code streams is greatly reduced, and the storage and transmission cost of the code streams is reduced; and the video stream is compounded and time-domain layered in the preprocessing, the memory consumption required by the storage of the reference frame is reduced for the time-domain layered interleaving reference on the reference relation, and the privacy protection frame code stream sent to the ordinary authority user does not comprise the original frame, so that the consumption of the network bandwidth for the ordinary authority user is low.

For example, as shown in fig. 6c, the electronic device is an encoding side device, and the rights management is implemented by the encoding side. The advanced authority user and the common authority user are two separated terminals, the advanced authority terminal performs standard decoder decoding on the received complete code stream, then extracts the privacy protection frame, and finally displays the privacy protection frame; and the common authority terminal directly performs standard decoder decoding on the received privacy protection code stream and then displays the privacy protection code stream.

An embodiment of the present application further provides a video encoding apparatus, where the apparatus includes:

the video stream acquisition module is used for acquiring a video stream to be processed;

In a possible implementation manner, the video frame configuration module is specifically configured to: for each video frame in the video stream to be processed, obtaining a first video frame and a second video frame which have the same content as the video frame based on the video frame, setting the frame type attribute of the first video frame as a privacy protection frame type to obtain a privacy protection frame, and setting the frame type attribute of the second video frame as an original frame type to obtain an original frame; and interweaving the privacy protection frame and the original frame corresponding to each video frame in the video stream to be processed into a composite video stream.

In a possible implementation manner, the video frame configuration module is specifically configured to: for each video frame in the video stream to be processed, obtaining an M +1 frame video frame with the same content as the video frame based on the video frame, setting the frame type attribute of the M frame video frame as a privacy protection frame type to obtain an M privacy protection frame, and setting the frame type attribute of the remaining frame video frame as an original frame type to obtain an original frame, wherein M is the number of the privacy protection types; and interweaving the privacy protection frame and the original frame corresponding to each video frame in the video stream to be processed into a composite video stream.

In a possible implementation manner, the reference mode setting module is specifically configured to: setting the time domain reference model of the composite video stream coding as a time domain scalable reference model supporting code stream framing; for each video frame to be coded in the composite video stream, if the video frame is an Mth frame privacy protection frame obtained from the same video frame in the video stream to be processed, performing privacy protection on the video frame by using a preset Mth privacy protection rule, and setting the video frame as a time domain base layer; if the video frame is an ith frame privacy protection frame obtained from the same video frame in the video stream to be processed, carrying out privacy protection on the video frame by using a preset ith privacy protection rule, and setting the video frame as a time domain basic layer or an M-i level time domain enhancement layer; if the video frame is an original frame, setting the video frame as a highest-level time-domain enhancement layer, where i is a positive integer smaller than M, the high-level time-domain enhancement layer may not be used as an inter-frame-coded reference frame of the low-level time-domain enhancement layer, the low-level time-domain enhancement layer may be used as an inter-frame-coded reference frame of the high-level time-domain enhancement layer, and the time-domain base layer may be used as an inter-frame-coded reference frame of each level of the time-domain enhancement layer.

In a possible implementation, the apparatus further includes a video data sending module configured to: reading data with a preset unit data amount from the coded video data stream every time, encrypting the currently read data and packaging the encrypted data if the data is data of an original frame aiming at the currently read data, and directly packaging the data if the data is data of a privacy protection frame so as to obtain a data stream to be sent, wherein the data stream to be sent comprises packaged data of the original frame and packaged data of the privacy protection frame; and respectively sending the data stream to be sent to a common authority user and a high-level authority user.

In a possible implementation, the apparatus further includes a video data sending module configured to: reading data with preset unit data amount from the coded video data stream every time, and obtaining a first code stream and a second code stream which have the same content with the currently read data based on the currently read data; if the current first code stream is data of an original frame, encrypting the current first code stream, and packaging the encrypted data, and if the current first code stream is data of a privacy protection frame, directly packaging the current first code stream to obtain a data stream to be sent, wherein the data stream to be sent comprises packaged data of the original frame and packaged data of the privacy protection frame; sending the data stream to be sent to a high-level authority user; if the current second code stream is data of an original frame, discarding the data, and if the current second code stream is data of a privacy protection frame, encapsulating the current second code stream to obtain a privacy protection data stream, wherein the privacy protection data stream comprises encapsulated data of the privacy protection frame; and sending the privacy protection data stream to a common authority user.

An embodiment of the present application further provides an electronic device, including: a processor and a memory;

the memory is used for storing computer programs;

the processor is configured to implement any of the video encoding methods of the present application when executing the computer program stored in the memory.

Optionally, referring to fig. 10, the electronic device according to the embodiment of the present application further includes a communication interface 12 and a communication bus 14, where the processor 11, the communication interface 12, and the memory 13 complete communication with each other through the communication bus 14.

The communication bus mentioned in the electronic device may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a RAM (Random Access Memory) or an NVM (Non-Volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any video encoding method in the present application.

In yet another embodiment provided herein, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the video encoding methods of the present application.

The method, the device, the storage medium, the computer program product and the like of the embodiments of the application can be applied to high-altitude parabolic monitoring scenes: in the application of high-altitude parabolic monitoring scenes, an electronic monitor is generally installed on the ground, and each floor of a high-rise building is monitored in real time from bottom to top. Because the installation height of the electronic monitor is generally approximate to the height of a low-rise building, the condition inside the low-rise building can be shot during real-time monitoring, and the privacy of residents on the low-rise building is leaked. By adopting the method, the device, the storage medium, the computer program product and the like of the embodiment of the application, the coordinates of the privacy protection area are directly configured according to the installation position of the electronic monitor, the security monitoring room is defined as a common authority user, the storage with the key is a high-level authority user, the privacy protection code stream output by the electronic monitor is decoded and then displayed in real time in the monitoring room, and the complete code stream is stored in the storage with the key. Therefore, the privacy of the users on the low floor is protected, and the requirement that the public security system uses the secret key to check the original video after the high-altitude parabolic event occurs is met.

The method, the device, the storage medium, the computer program product and the like of the embodiment of the application can be applied to security monitoring scenes: by applying the method, the equipment, the storage medium, the computer program product and the like of the embodiment of the application to the scenes of face recognition, behavior analysis and the like in the field of video monitoring, the intelligent algorithm automatically outputs the region coordinates needing privacy protection after face recognition, and codes or scrambles the face region appearing in the monitored video. The electronic monitor outputs the privacy protection code stream, the privacy protection code stream is decoded and displayed in real time, the complete code stream is stored in a memory with a secret key, and the original video is checked when the public security system is damaged and the public security system is involved in investigation.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It should be noted that, in this document, the technical features in the various alternatives can be combined to form the scheme as long as the technical features are not contradictory, and the scheme is within the scope of the disclosure of the present application. Relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a related manner, each embodiment focuses on differences from other embodiments, and the same and similar parts in the embodiments are referred to each other.

The above description is only for the preferred embodiment of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

27页详细技术资料下载

Video coding method and device, electronic equipment and storage medium

相关技术

网友询问留言