Reducing bit rate from surveillance cameras using skip block masks

文档序号：882761 发布日期：2021-03-19 浏览：2次中文

阅读说明：本技术 使用跳过块掩码来减小来自监控摄像机的比特率 (Reducing bit rate from surveillance cameras using skip block masks ) 是由约翰·尼斯特伦范星于 2020-09-14 设计创作，主要内容包括：本发明涉及使用跳过块掩码来减小来自监控摄像机的比特率。包括计算机程序产品的方法和装置,实现和使用用于减少来自监控摄像机的比特率的技术。接收识别表示摄像机视场的图像的第一区域的第一输入。第一区域对比特率有显著贡献。接收识别图像的第二区域的第二输入。第二区域包含对于监控摄像机的用户被认为是几乎没有视觉吸引力的信息。确定图像的第三区域。第三区域是第一区域和第二区域至少部分地重叠的区域。应用视频编码器设置以强制在第三区域中的至少一些中的跳过块,从而减小对来自第三区域的比特率的贡献。(The invention relates to reducing the bit rate from a surveillance camera using skip block masks. Methods and apparatus, including computer program products, implement and use techniques for reducing bit rate from surveillance cameras. A first input identifying a first region of an image representing a field of view of a camera is received. The first region contributes significantly to the bit rate. A second input identifying a second region of the image is received. The second area contains information that is considered to be hardly visually appealing to a user of the monitoring camera. A third region of the image is determined. The third region is a region where the first region and the second region at least partially overlap. Applying video encoder settings to force skipped blocks in at least some of the third regions to reduce the contribution to the bit rate from the third regions.)

1. A method for reducing the bit rate from a surveillance camera, comprising:

receiving a second input identifying a second region of the image containing information that is deemed to be of little visual appeal to a user of the surveillance camera;

receiving a first input identifying a first region of an image captured by a camera and representing a field of view of the camera, the first region having a bit rate contribution exceeding a predetermined threshold;

determining a third region of the image, the third region being a region where the first region and the second region overlap; and

applying video encoder settings to encode at least some of the third region as inter-mode encoded blocks of pixels that reference corresponding blocks of pixels in a reference frame from which corresponding image content is fully copied, thereby reducing the contribution to the bit rate from the third region.

2. The method of claim 1, wherein the first, second, and third regions are represented as blocks of pixels in an image captured by the camera.

3. The method of claim 1, wherein the second input is a user input.

4. The method of claim 3, wherein the second input is generated by the user through a graphical user interface or an application programming interface.

5. The method of claim 1, wherein the second input is automatically generated based on image segmentation.

6. The method of claim 1, wherein the first input is generated by the video encoder based on a threshold representing a cost for encoding the first region.

7. The method of claim 1, wherein the first input is generated by an image analysis algorithm based on a complexity of the image.

8. The method of claim 1, further comprising:

providing the suggestion of the third region to a user of the surveillance camera prior to applying the video encoder settings to allow the user to confirm or reject individual ones of the third regions.

9. The method of claim 8, wherein the suggestion of the third area is provided on a user interface as an overlay on the image.

10. The method of claim 1, further comprising:

calculating an estimated bit rate from the surveillance camera;

modifying at least some of the first and second regions to determine a modified third region; and

calculating a modified estimated bitrate from the surveillance camera using the modified third region.

11. The method of claim 10, further comprising:

modifying one or more of the first input and the second input using a result of the calculation; and

applying the video encoder settings according to the modified first input and second input.

12. A system for reducing bit rate from a surveillance camera, the system comprising:

a skip region calculation unit configured to:

receiving a first input identifying a first region of an image representing a field of view of a camera, the first region having a bit rate contribution exceeding a predetermined threshold,

receiving a second input identifying a second region of the image, the second region containing information that is deemed to be of little visual appeal to a user of the surveillance camera,

determining a third region of the image, the third region being a region where the first region and the second region overlap, an

An encoder configured to encode at least some of the third region as inter-mode encoded blocks of pixels that reference corresponding blocks of pixels in a reference frame from which corresponding image content is fully copied, thereby reducing a contribution to the bit rate from the third region.

13. A computer program for reducing the bit rate from a surveillance camera, the program containing instructions corresponding to the steps of:

receiving a first input identifying a first region of an image representative of a field of view of a camera, the first region having a bit rate contribution exceeding a predetermined threshold;

receiving a second input identifying a second region of the image containing information that is deemed to be of little visual appeal to a user of the surveillance camera;

determining a third region of the image, the third region being a region where the first region and the second region overlap; and

14. A digital storage medium comprising a computer program as claimed in claim 13.

Technical Field

The present invention relates to video coding and, more particularly, to reducing the bit rate for certain regions of images in a video stream captured by a surveillance camera.

Background

Surveillance cameras are used to monitor a variety of environments in many different applications, both indoors and outdoors. The image depicting the captured scene may be monitored by, for example, an operator or security personnel. In many cases, some parts of the image captured by the operator are more attractive than others. For example, the operator of a surveillance camera may be very interested in activities outside the building entrance, but may be less interested in seeing other moving or changing but unimportant features in the image (e.g., flashing neon signs over the entrance of the building, or trees moving in the wind). In another exemplary case, when the camera is used to record a sporting event such as a football match, the operator of the camera may be very interested in seeing the details of the activity on the football pitch, but less interested in seeing what has happened in the audience. On the other hand, for monitoring operators, the venue may in some cases be less attractive than the audience.

However, due to the fact that these images often contain a lot of movement or variations over time (in the form of moving objects or flickering light), these less attractive areas of the image often greatly affect the bit rate produced by the surveillance camera. Such dynamic image areas are typically more costly to encode than static image areas. This in turn may result in higher than desired bandwidth and storage usage if only the most "attractive" information in the image or video stream is retained. Therefore, it would be attractive to find a video coding solution that further reduces the bit rate produced by the surveillance camera.

Us patent No. 10,123,020, assigned to the assignee of the present application, describes block-level update rate control based on gaze sensing. According to the present invention, a video encoder reduces the update rate of blocks in an image by forcing the video encoder to send skipped blocks in a video frame when encoding an inter-frame. When a skip block is indicated for a portion of video, then no image data is sent for the portion of video. Typically, this applies to areas of the image that are not within the attention of the operator of the surveillance camera.

U.S. patent No. 9,756,348, also assigned to the assignee of the present application, describes a method, apparatus and system for generating a merged digital video sequence. Two digital video sequences of different pixel densities (and thus different bit rates) are generated. Pixel blocks that are considered to be related (e.g., pixel blocks that contain motion or a particular type of object) are identified. Blocks of pixels not considered to be relevant (e.g. blocks of pixels containing no motion or blocks of pixels belonging to the background of the image) are encoded using skipped blocks, resulting in a reduction of the bit rate of the camera.

U.S. patent No. 9,131,173 describes a digital image photographing apparatus for skip mode reading and a method of controlling the digital image photographing apparatus. An imaging surface of the imaging device is divided into a plurality of regions. The first skip mode is applied to a region expected to include the target object. The different second skip mode is applied to a region where the target object is not expected to be included, so that images having different resolutions can be obtained from a plurality of regions (for example, a region of an image not including the target object with a lower resolution than a region of an image including the target object).

Us patent No. 10,136,132 describes adaptive skip or zero block detection combined with a transition size decision. A video encoder uses skip mode encoding to determine whether and at what stage of the encoding process a block of a picture can be encoded as a skip block and/or a zero block based on, for example, an evaluation of the luminance values of the block to reduce the amount of computations and increase the speed at which encoding is performed.

Disclosure of Invention

It is an object of the present invention to provide techniques for reducing the bit rate from a surveillance camera to achieve efficient use of available bandwidth and storage. This and other objects are achieved by a method according to claim 1, a system according to claim 11, a computer program product according to claim 12 and a storage medium according to claim 13.

According to a first aspect, these and other objects are achieved, in whole or at least in part, by a method in a computer system for reducing a bit rate from a surveillance camera. The method comprises the following steps:

receiving a first input identifying a first region of the image representative of the field of view of the camera, the first region contributing significantly to the bit rate;

receiving a second input identifying a second region of the image, the second region containing information that is deemed to be of little visual appeal to a user of the surveillance camera;

determining a third region of the image, the third region being a region in which the first region and the second region at least partially overlap; and

applying video encoder settings to force skipped blocks in at least some of the third regions, thereby reducing the contribution to the bit rate from the third regions.

This provides a method of encoding areas that are of little or no appeal to the operator of the camera operator in a manner that uses very little data, and results in a significant reduction in bit rate and storage space compared to encoding the entire image using conventional techniques.

According to one embodiment, the first, second and third regions are represented as blocks of pixels in an image captured by the camera. Having regions that coincide with pixel blocks is a common way of video coding, where the image is divided into sub-regions and the redundancy between the sub-regions is analyzed. Therefore, similar techniques are used in the present invention to facilitate integration with conventional video surveillance systems.

According to one embodiment, the second input is a user input. That is, the user may make a determination as to which regions they consider to be "important" or "attractive" and provide such information to the encoder. This allows the user to have full control over the decision as to which regions are attractive or unattractive, without having to rely on "guesswork" by the encoder itself.

According to one embodiment, the user generates the second input through a graphical user interface or an application programming interface. This provides a convenient and intuitive way for a user to provide input to the encoder regarding areas of the image that the user finds attractive.

According to one embodiment, the second input is automatically generated based on image segmentation. This brings about a wide range of advantages for various use cases. For example, for large site installations and configurations with hundreds of cameras, rather than having the user specify the area of each camera one by one, deep learning is used to produce segmentation maps more efficiently.

According to one embodiment, a video encoder generates a first input according to a threshold representing a cost of encoding a first region. That is, the threshold may be set by the user or by the encoder itself, and may be used as a critical value for determining which regions have a high bit rate contribution, whether a relative value, or an absolute value, compared to other regions of the image.

According to one embodiment, the first input is generated by an image analysis algorithm based on a complexity of the image. That is, the captured image may be analyzed by an image analysis algorithm that determines which portions of the image are complex (and therefore require higher bit rate encoding) and identifies such image regions as the first region.

According to one embodiment, the suggestion of the third area may be provided to a user of the surveillance camera prior to applying the video encoder settings to allow the user to confirm or reject individual areas in the third area. That is, the skip region calculation unit may attempt to make a "best guess" as to what is a suitable third region (i.e., a region to be encoded as a skip block) and provide a suggestion to a user of such a region. The user may then accept or reject the suggestion from the skip area calculation unit. This may result in a faster determination of the third region (than when the user inputs all of the second regions) and then cause the encoder to determine the third region based on such input.

According to one embodiment, the suggestion of the third area is provided on the user interface as an overlay on the image. That is, the suggestion may be presented to the user as an overlay, which makes it easy for the user to see if the suggested region corresponds to the user's intended image region. It also makes it easy for the user to accept or reject all or individual proposals for the encoder.

According to one embodiment, the method further comprises calculating an estimated bitrate from the surveillance camera, modifying at least some of the first and second regions to determine a modified third region, and calculating the modified estimated bitrate from the surveillance camera using the modified third region. This allows the user to compare different "schemes", i.e. what the bit rate will happen if different groups of regions are selected as unattractive, or if different criteria are set for what should be considered to contribute highly to the bit rate, etc.

According to one embodiment, the method further comprises modifying one or more of the first input and the second input using the result of the calculation and applying the video encoder settings according to the modified first input and second input. This allows the user to change the original third zone group to a different third zone group. Having the ability to "experiment" and make various modifications in this manner may allow a user to conveniently achieve the best reduction in bit rate and storage space required for a particular monitoring scenario.

According to a second aspect, the invention relates to a system for reducing the bit rate from a surveillance camera. The system includes a skip region calculation unit and an encoder. The skip region calculation unit is configured to: receiving a first input identifying a first region of an image representing a field of view of a camera, the first region contributing significantly to a bit rate; receiving a second input identifying a second region of the image, the second region containing information that is deemed to be of little visual appeal to a user of the surveillance camera; a third region of the image is determined, the third region being a region where the first region and the second region at least partially overlap. The encoder is configured to force skipped blocks in at least some of the third regions, thereby reducing the contribution to the bit rate from the third regions. The system advantages correspond to the advantages of the method and may be varied similarly.

According to a third aspect, the invention relates to a computer program for reducing the bit rate from a surveillance camera. The computer program comprises instructions corresponding to the steps of:

receiving a first input identifying a first region of the image representative of the field of view of the camera, the first region contributing significantly to the bit rate;

receiving a second input identifying a second region of the image, the second region containing information that is deemed to be of little visual appeal to a user of the surveillance camera;

determining a third region of the image, the third region being a region in which the first region and the second region at least partially overlap; and

applying video encoder settings to force skipped blocks in at least some of the third regions, thereby reducing the contribution to the bit rate from the third regions.

According to a fourth aspect, the invention relates to a digital storage medium comprising such a computer program. The computer program and the storage medium relate to advantages corresponding to those of the method and may be varied analogously.

The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features and advantages of the invention will be apparent from the description and drawings, and from the claims.

Drawings

Fig. 1 shows a system for reducing the bit rate from a surveillance camera according to one embodiment.

Fig. 2 shows an example of a scene monitored by a camera.

Fig. 3 shows an example of the main structure of an image captured by the camera in fig. 2.

FIG. 4 illustrates an example of grouping pixels of the image in FIG. 3 into coding units according to one embodiment.

FIG. 5 illustrates an image captured by the camera of FIG. 2 with an overlaid bit rate contribution map, according to one embodiment.

Fig. 6 shows a schematic example of a camera in which various embodiments of the invention may be implemented.

Like reference symbols in the various drawings indicate like elements.

Detailed Description

As described above, it is an object of various embodiments of the present invention to reduce the bit rate from a surveillance camera. The user of the surveillance camera can specify areas of the image that contain "unimportant" information but still contribute significantly to the bit rate. Once these regions are specified, a skip block mask (mask) may be applied to the regions, which forces the encoder to encode the regions as skip blocks. Since a skip block contains very little data, typically only one bit, the bit rate can be significantly reduced by using this technique.

Embodiments of the present invention may include various tools to assist a user in selecting areas where skip block masks should be applied. For example, an overlay may be presented to a user over an image captured by a surveillance camera, the overlay indicating bit rate contributions from different regions of the image. These regions are referred to as "first regions" in the rest of the application. The bit rate contribution may be indicated, for example, by using differently colored, usually transparent overlays, for example reddish for bit rate contributions above, for example, a predetermined threshold and greenish for bit rate contributions below, for example, a certain threshold. The user may then select from the map a number of those regions on which the skip-block mask should be applied, e.g. regions where there is a high bit rate contribution but where attractive objects are not expected to be present. The user may also start by indicating all areas of the depicted scene that are "unimportant" (i.e., not visually appealing), for example by drawing polygons in a graphical user interface or entering coordinates of such areas. The area of no visual appeal is referred to as the "second area" in the rest of the application. Thereafter, the user may select the skip blocks of a plurality of regions in the mask image, which have a high bit rate and are not visually appealing, based on the overlap between the two types of regions. The area to which the skip block mask is to be applied is referred to as a "third area" in other parts of the present application. These regions are found in the overlap between the high bit rate regions and the regions that are not visually appealing.

Some embodiments may include various types of machine learning or artificial intelligence tools that may learn over time or during a configuration phase what types of objects and/or regions a user typically deems to be "unimportant". As mentioned above, such a region is referred to as "second region" in the rest of the application.

Suggestions for "unimportant" regions may be presented to the user for confirmation before being used as input to a skip block mask decision. As a convenient option, the user may be presented with a suggested coverage indicating bit rate contribution information and "unimportant" regions. Information of which regions are suggested to be unimportant may be presented as a pattern, such as a dashed line or a stripe. This can be conveniently combined with an overlay indicating the bit rate by adding color to the pattern, so that the user can quickly grasp the advice from the software. One example is to add a stripe pattern to the suggested "unimportant" areas and to color such stripes red in areas that also have a high bit rate contribution. Such overlays or markers of the image typically appear in the image area depicting the tree with the twigs swaying, and the user may then decide to apply the skip block mask to the area by selecting the area in the user interface, for example, by drawing a polygon at the top of the area in the graphical user interface and indicating that the polygon should be set as the skip block mask. The result is then that this image area will be updated at a much slower rate than the remaining images, e.g. once per GOP, instead of once per frame, although the tree moves its branches from one frame to another. Clearly, many different options exist for how to present suggestions to the user, and are available to the user interface designer.

In some embodiments, a user may be provided with suggestions for various skip block masks, and may be presented with "before" and "after" values that show how the bit rate from the camera will change when a particular skip block mask is applied to an image captured by a surveillance camera. The user may then configure the skip block mask according to their preferences based on this information.

In order to better understand the details of the invention described herein, a brief overview of image coding according to various embodiments will now be described. The images captured by the surveillance cameras are typically transmitted to a point of use, such as a control center, where the images may be viewed and/or stored. Alternatively, they may be stored in so-called "edge memory", i.e. memory on the camera, either a memory card on the camera (e.g. SD card) or memory connected to the camera (e.g. NAS (network attached storage)). Prior to transmission or edge storage, the pictures are typically encoded by an encoder to save bandwidth and storage space. The encoding may be performed in many different ways, e.g., according to the h.264 standard or other encoding standards.

In many digital video coding systems, two main modes are used to compress video frames in a sequence of video frames: intra mode and inter mode. In intra mode, the luminance and chrominance channels (or in some cases RGB or Bayer data) are encoded by exploiting spatial redundancy of pixels in a given channel of a single frame via prediction, transformation, and entropy coding. The encoded frames are called intra frames (also called "I frames"). Within an I-frame, blocks of pixels (also referred to as macroblocks, coding units or coding tree units) are coded in intra mode, that is, they are coded with reference to similar blocks within the same image frame, or are coded originally without reference.

In contrast, inter modes exploit temporal redundancy between separate frames and rely on motion compensated prediction techniques that predict portions of a frame from one or more reference frames by encoding motion in pixels from one frame to another for a selected block of pixels. The encoded frames are referred to as inter-frames, P-frames (forward predicted frames), which may reference previous frames in decoding order, or B-frames (bi-directionally predicted frames), which may reference two or more previously decoded frames and may have any arbitrary display order relationship for the predicted frames. Within an inter frame, blocks of pixels may be encoded in inter mode, meaning that they are encoded with reference to similar blocks in a previously decoded image, or in intra mode, meaning that they are encoded with reference to similar blocks within the same image frame, or originally encoded without reference. A skipped block is an inter-mode encoded block of pixels that references a corresponding block of pixels in a reference frame from which the image content should be completely copied.

The encoded image frames are arranged in a group of pictures (GOP). Each GOP begins with an I-frame that does not reference any other frame and is followed by a number of inter-frames (i.e., P-frames or B-frames) that reference other frames. The image frames do not necessarily have to be encoded and decoded in the same order in which they were captured or displayed. The only inherent limitation is that a frame used as a reference frame must be decoded before other frames using the reference frame as a reference can be encoded.

As mentioned above, in the image region (i.e., the third image region) where the skip block mask is created, the encoder forces, in one embodiment, to skip blocks, e.g., for each frame in the GOP other than the I-frame, or for even longer time periods. This may be appropriate in situations where the scene does not change often. In another embodiment, these third image regions may be analyzed on a per-frame basis or at a substantially higher frame rate, such that there is a matching skip map for each non-I-frame. The skip period may be selected by the user and may be different for different "skip block masks". It should be noted that by not masking the I-frame, a simple "delayed view" (i.e., only the I-frame is visible upon playback) of the region masked by the skip block mask can be created. This may be useful, for example, in certain situations (e.g., retail environments).

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," module "or" system. Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer-readable media having computer-readable program code embodied thereon.

Any combination of one or more computer-readable media may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium having computer readable program code embodied therein may include a propagated data signal, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server as a stand-alone software package. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program instructions according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

Techniques according to various embodiments of the present invention will now be described by way of example and with reference to the accompanying drawings.

FIG. 1 is a schematic block diagram illustrating a system 100 in which an image encoding technique according to various embodiments may be implemented. For example, the system 100 may be implemented in a camera that captures images (e.g., video sequences) of a scene. System 100 includes an image sensor 102, a skip region calculation unit 104, a scaler 106, and an encoder 108. Briefly, the image sensor 102 captures an image of a scene; the skipped area calculation unit 104 determines a third area based on the first area and the second area, and the scaler 106 performs further operations such as reducing or enlarging an image, rotating an image, adding various types of overlays, and the like; and the encoder 108 encodes the image and forces the third area to be encoded as a skip block. These operations will be described in further detail below.

An example of a scene monitored by a camera is shown in fig. 2. In the scenario 200, there is a house 202 having windows 204, 214 and a door 206. The car 208 is parked at the front of the house and the first person 210 is standing outside the house. A second person 212 is in the house and can be seen through one of the windows 204, 214.

The camera 216 captures images of a scene using the sensor 102 of the system 100 in the camera. Fig. 3 shows the main structure of an image 302 captured by the sensor 102. Image 302 is made up of a plurality of pixels 304 corresponding to the pixels of image sensor 102. An image may be composed of, for example, 1280x720 pixels, 1920x1080 pixels, or 3840x2160 pixels.

The image captured by the sensor 102 is subjected to standard image processing including, for example, noise reduction, local tone mapping, spatial and temporal filtering, etc. The image is then sent to the skip area calculation unit 104. For purposes of the various embodiments of the invention described herein, one important operation performed by skip region computation unit 104, as shown in fig. 4, includes grouping pixels 304 of image 302 into coding units 402 of neighboring pixels 304. The coding unit 402 is also referred to as a block, a macroblock, a pixel block, a coding tree unit, or a coding unit. The encoding unit 402 is generally square and is composed of, for example, 8 × 8, 16 × 16, or 32 × 32 pixels. However, pixels 304 may be grouped into coding units 402 of other sizes and shapes. It should be noted that the size of the encoding unit 402 in fig. 4 is enlarged compared to the size of the pixels in fig. 3 for the purpose of illustration and explanation. In real life, there are typically a large number of coding units 402 for the number of pixels 304 of fig. 3. A bit rate contribution value is determined for each coding unit 402. The bit rate contribution value for each coding unit may be determined in a number of ways, for example, by using a cost function of the encoder. Based on the cost, the encoder may determine whether the coding unit should be intra-coded, inter-coded, or coded as a skip block.

Fig. 5 shows an image 502 captured by the camera 216. It may be suspected from examining the image that the tree to the right of the image may contribute significantly to the bit rate, and that the sky above the tree may also contribute significantly to the bit rate due to, for example, passing clouds and the like (especially on high wind days). The user may decide from a monitoring perspective that these parts of the image are not very important and, therefore, she may indicate that the skip block mask may be applied to these high bit rate areas. As mentioned above, skipped blocks typically use 1 bit of data, so a significant saving in bit rate from the surveillance camera can be obtained.

Further, in some embodiments, a machine learning system, such as an artificial neural network, may be used to learn which functions are typically considered unimportant by one or more users. For example, the system may learn that the average user of the surveillance camera is not interested in recording the images of the tree. The system may then automatically identify trees, sky, etc. in the image and propose to the encoder to skip the block mask. Optionally, the system may also present the user with an alternative skipped block mask alternative, and the user may make a decision as to which skipped block mask to use in the different alternative before passing the information to the encoder. Also, many variations of skip block mask selection are available to those of ordinary skill in the art.

In fig. 6, a camera 216 is shown that includes a system 100 such as that shown in fig. 1. The camera 216 also has many other components, but they are not shown because they are not part of the present invention and will not be discussed further herein. The camera 216 may be any kind of camera, such as a visible light camera, an IR camera, or a thermal camera.

As described in connection with fig. 6, the encoding system 100 may be integrated in a camera 216. However, it is also possible to arrange some parts or the entire encoding system 100 separately and operatively connect it to the camera. It is also possible to transfer the image from the camera to e.g. a control center without any skip block mask and to apply the skip block mask in the control center e.g. in a VMS (video management system). In such a case, the encoding system may be arranged in the VMS or otherwise in the control center and used for so-called transcoding, where the encoded image is received from the camera, decoded, and then re-encoded, but now using the skip block mask.

The various embodiments of the invention described herein may be used with any encoding scheme that uses a GOP structure with intra frames and subsequent inter frames (e.g., all of h.264, h.265mpeg-4 part 2, VP8, or VP9, which are familiar to those of ordinary skill in the art).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The description of various embodiments of the present invention has been presented for purposes of illustration but is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. For example, while the encoder typically determines whether the coding unit should be intra-coded, inter-coded, or coded as a skip block, as described above, embodiments are possible in which the user explicitly specifies the type of coding. This may be done manually through a user interface at the beginning of the process, for example, or by a user viewing and confirming or overriding suggestions provided by the encoder. Typically, the user only specifies which coding units should be coded as skipped blocks and leaves the coding decisions regarding intra-coding and inter-coding to the encoder. Accordingly, many other variations may occur to those skilled in the art which fall within the scope of the claims.

The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application or technical improvements to the techniques found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

14页详细技术资料下载

Reducing bit rate from surveillance cameras using skip block masks

相关技术

网友询问留言