Motion vector selection and prediction in video coding systems and methods

文档序号:1786429 发布日期:2019-12-06 浏览:28次 中文

阅读说明:本技术 视频编码系统和方法中的运动向量选择和预测 (Motion vector selection and prediction in video coding systems and methods ) 是由 朱维佳 蔡家扬 于 2017-02-24 设计创作,主要内容包括:本文提供了用于使用递归编码块分割方案对视频帧序列中的未编码视频帧进行编码的系统和方法。在将帧划分成最大允许尺寸的像素区域(LCB尺寸的编码块)之后,可以将每个LCB尺寸的编码块候选项(“LCBC”)分割成较小的CBC。该过程可以递归地继续直到编码器确定(1)当前CBC适合于编码(例如,因为当前CBC仅包含单个值的像素)或(2)当前CBC是用于特定实现方式的编码块候选项的最小尺寸,例如2×2、4×4等(“MCBC”),以先到者为准。然后可以使用两种帧内预测技术中的一种将预测值分配给编码块的像素:非平方模板匹配技术或定向预测技术。(Systems and methods for encoding an unencoded video frame of a sequence of video frames using a recursive coding block partitioning scheme are provided herein. After dividing the frame into maximum allowed size pixel areas (LCB sized coding blocks), each LCB sized coding block candidate ("LCBC") may be partitioned into smaller CBCs. This process may continue recursively until the encoder determines that (1) the current CBC is suitable for encoding (e.g., because the current CBC contains only pixels of a single value) or (2) the current CBC is the minimum size of a coded block candidate for a particular implementation, e.g., 2 × 2, 4 × 4, etc. ("MCBC"), whichever comes first. The prediction values may then be assigned to pixels of the encoded block using one of two intra prediction techniques: non-square template matching techniques or directional prediction techniques.)

1. A method of encoding an unencoded video frame of a sequence of video frames to generate an encoded bitstream representing the unencoded video frame, the unencoded video frame comprising an array of pixels, and the encoded bitstream representing the unencoded video frame comprising at least a header and a video data payload, the method comprising:

Obtaining the pixel array;

Dividing the pixel array along a plurality of horizontal and vertical axes, thereby creating a plurality of largest sized encoded blocks; and

For one of the plurality of largest sized encoded blocks:

(a) Determining whether the encoded block should be encoded or further partitioned;

(b) Upon determining that the encoded block should be encoded:

(b.1) creating an encoded version of the encoded block;

(b.2) providing an indication in the header of the encoded bitstream representing the unencoded video frame that the encoded version of the encoded block has been created; and is

(b.3) providing the encoded version of the encoded block in the video data payload of the encoded bitstream representing the unencoded video frame; and

(c) When the encoded block is determined to be further divided:

(c.1) dividing the encoded blocks along at least one of a horizontal axis and a vertical horizontal axis, thereby creating a plurality of new encoded blocks;

(c.2) providing an indication in the header of the encoded bitstream representing the unencoded video frame that the encoded block is further divided; and is

(c.3) recursively performing (a) - (c) for one of the plurality of newly encoded blocks.

2. The method of claim 1 wherein the coding blocks of the plurality of largest sized coding blocks have a horizontal size of sixty-four pixels and a vertical size of sixty-four pixels, and the coding blocks of the plurality of newly coded blocks have a horizontal size of at least two pixels and a vertical size of at least two pixels.

3. the method of claim 1, wherein:

(b.2) including assigning a first value to a coding block partition flag associated with the coding block and providing the coding block partition flag in the header of the encoded bitstream representing the unencoded video frame, the first value indicating that the encoded version of the coding block was created and provided in the video data payload of the encoded bitstream representing the unencoded video frame; and is

(c.2) including assigning one of a second value, a third value, or a fourth value to the coding block split flag associated with the current coding block; and providing the coding block split flag in the header of the encoded bitstream representing the unencoded video frame, the second value indicating that the coding block is divided along the horizontal axis, the third value indicating that the coding block is divided along the vertical horizontal axis, and the fourth value indicating that the coding block is divided along the horizontal axis and the vertical horizontal axis.

4. the method of claim 3, wherein the coding block has a vertical dimension measured in pixels and a horizontal dimension measured in pixels, (c.1) includes determining that the vertical dimension is greater than the horizontal dimension and dividing the coding block along the horizontal transverse axis; (c.2) including assigning the second value to the coded block split flag.

5. the method of claim 4, wherein the vertical dimension is twice the horizontal dimension.

6. The method of claim 3, wherein the coding block has a vertical dimension measured in pixels and a horizontal dimension measured in pixels, (c.1) includes determining that the vertical dimension is less than the horizontal dimension and dividing the coding block along the vertical lateral axis; (c.2) including assigning the third value to the coded block partition flag.

7. The method of claim 6, wherein the vertical dimension is half of the horizontal dimension.

8. The method of claim 3, wherein the coding block has a vertical dimension measured in pixels and a horizontal dimension measured in pixels, (c.1) includes determining that the vertical dimension is equal to the horizontal dimension and dividing the coding block along the horizontal axis; (c.2) including assigning the second value to the coded block split flag.

9. The method of claim 3, wherein the coding block has a vertical dimension measured in pixels and a horizontal dimension measured in pixels, (c.1) includes determining that the horizontal dimension is equal to the vertical dimension and dividing the coding block along the vertical lateral axis; (c.2) including assigning the third value to the coded block partition flag.

10. The method of claim 3, wherein the coding block has a vertical dimension measured in pixels and a horizontal dimension measured in pixels, (c.1) includes determining that the horizontal dimension is equal to the vertical dimension and dividing the coding block along the horizontal and vertical horizontal axes; (c.2) including assigning the fourth value to the coded block partition flag.

11. A method of encoding an unencoded video frame of a sequence of video frames to generate an encoded bitstream representative of the unencoded video frame, the unencoded video frame comprising an array of pixels including a processed region of pixels and an unprocessed region of pixels, the processed region of pixels having a prediction value associated therewith and a second region having no prediction value associated therewith, and the encoded bitstream representative of the unencoded video frame including at least a header and a video data payload, the method comprising:

(a) Obtaining a first pixel block of the unprocessed pixel region, the first pixel block having a first width and a first height;

(b) Selecting a prediction region from the processed pixel region, the prediction region comprising a plurality of first pixels in a first spatial configuration, the prediction template having a first spatial configuration and being in a first position relative to the first block of pixels;

(c) Identifying a pixel matching arrangement within the processed pixel region, the pixel matching arrangement comprising a plurality of second pixels in the first spatial configuration and at the first position relative to a second pixel block, the second pixel block having the first width and the first height;

(d) For a first pixel in the first pixel block:

(d.1) identifying corresponding pixels of the prediction region;

(d.2) mapping a predictor associated with the corresponding pixel of the prediction region to the first pixel in the first pixel block; and is

(e) Repeating (d) for each remaining pixel of the first block of pixels, and

Wherein the completion of (e) makes the first pixel block a part of the processed pixel region.

12. The method of claim 11, wherein the first block of pixels has a top side and a left side, the prediction region has a bottom side and a right side, the bottom side of the prediction template abuts the top side of the first block of pixels and the right side of the prediction template abuts the left side of the first block of pixels.

13. The method of claim 12, wherein the second block of pixels has a top side and a left side, the pixel matching arrangement has a bottom side and a right side, the bottom side of the pixel matching arrangement abuts the top side of the second block of pixels, and the right side of the matching arrangement abuts the left side of the second block of pixels.

14. The method of claim 11, wherein the pixel matching arrangement is further defined by each pixel in the pixel matching arrangement having a prediction value that (1) corresponds spatially in the prediction template, and (2) completely matches a prediction value of the spatially corresponding pixel in the prediction template.

15. The method of claim 11, wherein the pixel matching arrangement is further defined by each pixel in the pixel matching arrangement having a prediction value that matches (1) a spatially corresponding pixel in the prediction template, and (2) a prediction value of the spatially corresponding pixel in the prediction region within a predefined tolerance threshold.

16. The method of claim 11, wherein (c) comprises:

(c.1) selecting a first pixel of the prediction template, the first pixel having a first spatial location within the first spatial configuration;

(c.2) for pixels of the first pixel region:

identifying a plurality of potentially matched pixels having the first spatial configuration, wherein the pixels of the processed pixel region have a first spatial location within the first spatial configuration, thereby comparing a prediction value associated with the pixels of the first pixel region with a prediction value associated with the first pixel;

Upon determining that the prediction value associated with the pixel of the first pixel region matches the prediction value associated with the first pixel.

17. A method of encoding an unencoded video frame of a sequence of video frames to generate an encoded bitstream representative of the unencoded video frame, the unencoded video frame comprising an array of pixels including a processed region of pixels and an unprocessed region of pixels, the processed region of pixels having a prediction value associated therewith and a second region having no prediction value associated therewith, and the encoded bitstream representative of the unencoded video frame including at least a header and a video data payload, the method comprising:

(a) Obtaining a first block of pixels of the unprocessed pixel region having a plurality of pixel rows including a top pixel row and a plurality of pixel columns including a left pixel column;

(b) Selecting a prediction region from the processed pixel regions, the prediction region comprising a plurality of pixels in a first spatial configuration;

(c) Mapping a predictor from a first pixel of the first prediction region to at least one diagonally consecutive pixel of the first pixel block;

(d) Repeating (c) for each remaining pixel of the first prediction region;

Wherein completion of (d) results in the first pixel block being part of the processed pixel region.

18. The method of claim 17, wherein each pixel of the plurality of pixels is diagonally consecutive to at least one pixel of the first block of pixels.

19. A method of encoding an unencoded video frame of a sequence of video frames to generate an encoded bitstream representative of the unencoded video frame, the unencoded video frame comprising an array of pixels including a processed region of pixels and an unprocessed region of pixels, the processed region of pixels having a prediction value associated therewith and a second region having no prediction value associated therewith, and the encoded bitstream representative of the unencoded video frame including at least a header and a video data payload, the method comprising:

(a) Obtaining a first block of pixels of the unprocessed pixel region having a plurality of pixel rows including a top pixel row and a plurality of pixel columns including a left pixel column;

(b) Selecting a prediction region from the processed pixel region, the prediction region adjoining at least one side of the first pixel block and comprising a plurality of first pixels in a first spatial configuration;

(c) Selecting a second prediction region from the processed pixel regions, the second prediction region adjoining at least one side of the first pixel block and comprising a plurality of second pixels in a second spatial configuration;

(c) Generating a synthetic predictor for a first pixel in the first block of pixels using a first predictor from a pixel of the first prediction region and a second predictor from a pixel of the second prediction region, the pixel of the first prediction region and the pixel of the second prediction region each being diagonally consecutive to the first pixel of the first block of pixels;

(d) Repeating (c) for each remaining pixel of the first prediction region;

Wherein completion of (d) results in the first pixel block being part of the processed pixel region.

20. The method of claim 19, wherein the synthetic Predictor (PV) for the first pixel in the first pixel block is generated according to the equation:

PV=a*PL+(1—a)PB

Wherein PL represents the first predictor, and PB represents the second predictor, and a represents a predefined prediction efficiency coefficient.

Technical Field

The present disclosure relates to encoding and decoding of video signals, and more particularly, to selecting predictive motion vectors for frames of a video sequence.

Background

The advent of digital multimedia such as digital images, voice/audio, graphics, and video has significantly improved various applications and opened brand new applications due to its relative ease of reliable storage, communication, transmission, and search and access of content. In general, the application fields of digital multimedia are very wide, covering a wide range of fields such as entertainment, information, medicine, and security, and benefiting society in various ways. Multimedia captured by sensors such as cameras and microphones is typically analog and digitized by a digitization process in the form of Pulse Code Modulation (PCM). However, just after digitization, the amount of resulting data required as an analog representation needed to recreate the speakers and/or television display may be very large. Therefore, efficient communication, storage, or transmission of large amounts of digital multimedia content requires compression from an original PCM form to a compressed representation. Accordingly, many techniques for multimedia compression have been invented. Video compression techniques have become so complex for many years that they can typically achieve high compression factors between 10 and 100 while maintaining high psycho-visual quality, typically similar to uncompressed digital video.

To date, while tremendous advances have been made in the art and science of Video compression (as demonstrated by numerous standards-agency driven Video coding standards (such as MPEG-1, MPEG-2, H.263, MPEG-4part2, MPEG-4AVC/H.264, MPEG-4SVC and MVC), and industry driven proprietary standards (such as Windows Media Video, RealVideo, On2 Vp, etc.), the increasing consumer demand for higher quality, higher definition, and now 3D (stereo) Video available for access, whenever and wherever necessary, can be delivered to various client devices (such as PCs/BDs, televisions, set-top boxes, game consoles, portable Media players/devices via various means such as DVD/BD via over-the-air, cable/satellite, cable, and mobile networks, Smart phones and wearable computing devices) have fueled a demand for higher levels of video compression. This is evidenced by recent efforts by ISO MPEG in high efficiency video coding in standards-agency driven standards, which promises to combine new technology contributions from years of exploratory work on h.265 video compression with technology by the ITU-T standards committee.

All of the aforementioned standards employ a common intra/inter predictive coding framework in order to reduce spatial and temporal redundancy in the coded bitstream. The basic concept of inter prediction is to eliminate temporal dependency between neighboring pictures by using a block matching method. At the beginning of the encoding process, each frame of an uncoded video sequence is grouped into one of three categories: i-type frames, P-type frames, and B-type frames. Type I frames are intra-coded. That is, only information from the frame itself is used to encode the picture, and inter-frame motion compensation techniques are not used (but intra-frame motion compensation techniques may be applied).

The other two types of frames (P-type and B-type) are encoded using inter-frame motion compensation techniques. The difference between P and B pictures is the temporal direction of the reference picture used for motion compensation. P-type pictures utilize information from previous pictures in display order, while B-type pictures can utilize information from previous and future pictures in display order.

For P-type and B-type frames, each frame is divided into blocks of pixels represented by coefficients of the luminance and chrominance components of each pixel, and one or more motion vectors are obtained for each block (two motion vectors can be encoded for each block, since B-type pictures can exploit information from future and past encoded frames). The Motion Vector (MV) represents the spatial displacement from the position of the current block to the position of a similar block (referred to as reference block and reference frame, respectively) in another previously encoded frame (which may be a past or future frame in display order). A difference between the reference block and the current block is calculated to generate a residual (also referred to as a "residual signal"). Thus, for each block of an inter-coded frame, only the residual and motion vectors need to be encoded, and not the entire contents of the block. By eliminating this temporal redundancy between frames of a video sequence, the video sequence can be compressed.

To further compress the video data, the coefficients of the residual signal are typically transformed from the spatial domain into the frequency domain (e.g., using a discrete cosine transform ("DCT")) or a discrete sine transform ("DST") after inter-frame or intra-frame prediction techniques have been applied. For naturally occurring images, such as the type of images that typically make up a human-perceptible video sequence, low-frequency energy is always stronger than high-frequency energy. Therefore, the residual signal in the frequency domain has a better energy compression effect than the residual signal in the spatial domain. After the forward transform, the coefficients and motion vectors may be quantized and entropy encoded.

On the decoder side, inverse quantization and inverse transformation are applied to recover the spatial residual signal. These are typical transform/quantization processes in all video compression standards. A backward prediction process may then be performed to generate a recreated version of the original unencoded video sequence.

in past standards, the blocks used in coding were typically 16 × 16 pixels (called macroblocks in many video coding standards). However, as these standards have developed, the frame size has become larger, and many devices have obtained functionality that displays higher than "high definition" (or "HD") frame sizes (such as 2048 × 1530 pixels). Therefore, it may be desirable to have larger blocks to efficiently encode motion vectors for these frame sizes (e.g., 64 × 64 pixels). However, it may also be desirable to be able to perform motion prediction and transformation on a relatively small scale (e.g., 4 × 4 pixels) due to the corresponding increase in resolution.

As the motion prediction resolution increases, the amount of bandwidth required to encode and transmit motion vectors increases for each frame and for the entire video sequence.

Drawings

Fig. 1 illustrates an exemplary video encoding/decoding system in accordance with at least one embodiment.

FIG. 2 illustrates several components of an exemplary encoding device in accordance with at least one embodiment.

FIG. 3 illustrates several components of an exemplary decoding device in accordance with at least one embodiment.

Fig. 4 illustrates a block diagram of an exemplary video encoder in accordance with at least one embodiment.

Fig. 5 illustrates a block diagram of an exemplary video decoder in accordance with at least one embodiment.

FIG. 6 illustrates an exemplary motion vector selection routine in accordance with at least one embodiment.

Fig. 7 illustrates an exemplary motion vector candidate generation subroutine in accordance with at least one embodiment.

FIG. 8 illustrates an exemplary motion vector recovery routine in accordance with at least one embodiment.

Fig. 9 shows a schematic diagram of an exemplary 8 x 8 prediction block in accordance with at least one embodiment.

Fig. 10A-10B illustrate an alternative exemplary motion vector candidate generation subroutine in accordance with at least one embodiment.

Fig. 11 illustrates a schematic diagram of an exemplary recursive encoding block partitioning scheme in accordance with at least one embodiment.

Fig. 12 illustrates an exemplary encoding block index routine in accordance with at least one embodiment.

Fig. 13 illustrates an exemplary encoded block partitioning subroutine in accordance with at least one embodiment.

fig. 14A-14C illustrate a schematic diagram of an application of the exemplary recursive coded block partitioning scheme illustrated in fig. 11 in accordance with at least one embodiment.

15A-15B illustrate schematic diagrams of two regions of pixels corresponding to portions of respective video frames, in accordance with at least one embodiment.

Fig. 16 shows a schematic diagram of a video frame including the pixel region shown in fig. 15A.

Fig. 17 illustrates an exemplary rectangular coding block predictor selection routine in accordance with at least one embodiment.

FIG. 18 illustrates an exemplary processing area search subroutine in accordance with at least one embodiment.

FIG. 19 illustrates an exemplary template matching test subroutine in accordance with at least one embodiment.

Fig. 20A-20E illustrate schematic diagrams of five regions of pixels corresponding to portions of respective video frames, in accordance with at least one embodiment.

Fig. 21-21B illustrate schematic diagrams of a pixel region corresponding to a portion of a video frame, according to at least one embodiment.

FIG. 22 illustrates an example orientation predictor selection routine in accordance with at least one embodiment.

Detailed Description

the detailed description that follows is represented primarily by symbolic representations of processes and operations for conventional computer components, including a processor, memory storage devices for the processor, connected display devices, and input devices. Further, the processes and operations may utilize conventional computer components, including remote file servers, computer servers, and memory storage devices, in a heterogeneous distributed computing environment. Each of these conventional distributed computing components may be accessible by the processor via a communications network.

The phrases "in one embodiment," "in at least one embodiment," "in various embodiments," "in some embodiments," and the like may be used repeatedly herein. Such phrases are not necessarily referring to the same embodiment. The terms "comprising", "having" and "including" are synonymous, unless the context dictates otherwise. As noted above, the various embodiments are described in the context of a typical "hybrid" video coding method, as it uses inter/intra picture prediction and transform coding.

Reference will now be made in detail to the description of the embodiments illustrated in the drawings. Although embodiments have been described in connection with the drawings and the associated descriptions, it will be appreciated by those of ordinary skill in the art that alternate and/or equivalent implementations may be substituted for the specific embodiments shown and described, including all alternatives, modifications, and equivalents, whether explicitly shown and/or described, without departing from the scope of the present disclosure. In various alternative embodiments, additional devices or combinations of the devices shown may be added or combined, without limiting the scope to the embodiments disclosed herein.

Exemplary video encoding/decoding System

Fig. 1 illustrates an exemplary video encoding/decoding system 100 in accordance with at least one embodiment. The encoding device 200 (shown in fig. 2 and described below) and the decoding device 300 (shown in fig. 3 and described below) are in data communication with the network 104. The decoding device 200 may be in data communication with the unencoded video source 108 via a direct data connection such as a storage area network ("SAN"), a high-speed serial bus, and/or via other suitable communication techniques or via the network 104 (as indicated by the dashed lines in fig. 1). Similarly, the encoding device 300 may be in data communication with the optional encoding video source 112 via a direct data connection such as a storage area network ("SAN"), a high-speed serial bus, and/or via other suitable communication techniques or via the network 104 (as indicated by the dashed lines in fig. 1). In some embodiments, the encoding device 200, the decoding device 300, the encoded video source 112, and/or the unencoded video source 108 may include one or more replicated and/or distributed physical or logical devices. In many embodiments, there may be more encoding devices 200, decoding devices 300, unencoded video sources 108, and/or encoded video sources 112 than shown.

in various embodiments, the encoding device 200 may be a networked computing device capable of accepting requests, e.g., from the decoding device 300, typically over the network 104 and providing responses accordingly. In various embodiments, the decoding device 300 may be a networked computing device having a form factor such as a mobile phone; a watch, glasses, or other wearable computing device; a dedicated media player; a computing tablet computer; a motorized head unit; an Audio Video On Demand (AVOD) system; a dedicated media console; a gaming device, a "set-top box," a digital video recorder, a television, or a general purpose computer. In various embodiments, the network 104 may include the internet, one or more local area networks ("LANs"), one or more wide area networks ("WANs"), a cellular data network, and/or other data networks. The network 104 may be a wired and/or wireless network at various points.

Exemplary encoding device

Referring to fig. 2, several components of an exemplary encoding device 200 are shown. In some embodiments, the encoding device may include more components than those shown in fig. 2. However, it is not necessary that all of these generally conventional components be shown in order to disclose an illustrative embodiment. As shown in fig. 2, the exemplary encoding device 200 includes a network interface 204 for connecting to a network, such as the network 104. The exemplary encoding device 200 also includes a processing unit 208, a memory 212, an optional user input 214 (e.g., an alphanumeric keyboard, a keypad, a mouse or other pointing device, a touch screen and/or microphone), and an optional display 216, all interconnected with the network interface 204 via a bus 220. The memory 212 typically includes RAM, ROM, and permanent mass storage devices such as disk drives, flash memory, and the like.

The memory 212 of the exemplary encoding device 200 stores an operating system 224 and program code for a number of software services, such as a software-implemented inter-frame video encoder 400 (described below with reference to fig. 4) having instructions for performing a motion vector selection routine 600 (described below with reference to fig. 6). The memory 212 may also store video data files (not shown) that may represent unencoded copies of audio/video media pieces, such as movies and/or television episodes, as examples. These and other software components may be loaded into the memory 212 of the encoding device 200 using a drive mechanism (not shown) associated with a non-transitory computer-readable medium 232, such as a floppy disk, magnetic tape, DVD/CD-ROM drive, memory card, etc. Although the example encoding device 200 has been described, the encoding device may be any of a number of networked computing devices capable of communicating with the network 120 and executing instructions for implementing video encoding software (such as the example software-implemented video encoder 400) and the motion vector selection routine 600.

In operation, the operating system 224 manages the hardware and other software resources of the encoding device 200 and provides general-purpose services for software applications such as the software-implemented interframe video encoder 400. The operating system 224 mediates between software executing on the encoding device and hardware for hardware functions such as network communications via the network interface 204, receiving data via the input 214, outputting data via the display 216, and allocating memory 212 for various software applications, such as a software-implemented interframe video encoder 400.

In some embodiments, the encoding device 200 may further include a dedicated unencoded video interface 236 for communicating with the unencoded video source 108, such as over a high speed serial bus. In some embodiments, the encoding device 200 may communicate with the unencoded video source 108 via the network interface 204. In other embodiments, the unencoded video source 108 may reside in the memory 212 or the computer-readable medium 232.

although an example encoding device 200 has been described that generally conforms to conventional general purpose computing devices, the encoding device 200 may be any of a number of devices capable of encoding video, such as a video recording device, a video co-processor and/or accelerator, a personal computer, a gaming console, a set-top box, a handheld or wearable computing device, a smart phone, or any other suitable device.

For example, the encoding device 200 may operate in accordance with an on-demand media service (not shown). In at least one exemplary embodiment, the on-demand media service may operate encoding device 200 in accordance with an online on-demand media store that provides digital copies of media works, such as video content, to users on a per-work and/or subscription basis. The on-demand media service may obtain digital copies of such media works from the unencoded video source 108.

Exemplary decoding device

Referring to fig. 3, several components of an exemplary decoding device 300 are shown. In some embodiments, the decoding device may include more components than those shown in fig. 3. However, it is not necessary that all of these generally conventional components be shown in order to disclose an illustrative embodiment. As shown in fig. 3, the exemplary decoding device 300 includes a network interface 304 for connecting to a network, such as network 104. The exemplary decoding device 300 also includes a processing unit 308, a memory 312, optional user inputs 314 (e.g., an alphanumeric keyboard, a keypad, a mouse or other pointing device, a touch screen and/or microphone), an optional display 316, and an optional speaker 318, all interconnected with the network interface 304 via a bus 320. Memory 312 typically includes RAM, ROM, and permanent mass storage devices such as disk drives, flash memory, and the like.

The memory 312 of the example decoding device 300 may store the operating system 324 as well as program code for a number of software services, such as a software-implemented video decoder 500 (described below with reference to fig. 5) having instructions for performing a motion vector recovery routine 800 (described below with reference to fig. 8). The memory 312 may also store video data files (not shown) that may represent encoded copies of audio/video media pieces, such as movies and/or television episodes, as examples. These and other software components may be loaded into the memory 312 of the decoding device 300 using a drive mechanism (not shown) associated with a non-transitory computer-readable medium 332, such as a floppy disk, magnetic tape, DVD/CD-ROM drive, memory card, etc. Although the example decoding device 300 has been described, the decoding device may be any of a number of networked computing devices capable of communicating with a network, such as the network 120, and executing instructions for implementing video decoding software (such as the example software-implemented video decoder 500) and accompanying message extraction routine 700.

In operation, the operating system 324 manages the hardware and other software resources of the decoding device 300 and provides general-purpose services for software applications, such as the software-implemented video decoder 500. Operating system 324 mediates between software executing on the encoding device and hardware for hardware functions such as network communications via network interface 304, receiving data via input 314, outputting data via display 316 and/or optional speaker 318, and allocating memory 312.

In some embodiments, the decoding device 300 may further include an optional encoded video interface 336 for communicating with the encoded video source 116, such as over a high speed serial bus. In some embodiments, the decoding device 300 may communicate with an encoded video source, such as the encoded video source 116, via the network interface 304. In other embodiments, the encoded video source 116 may reside in the memory 312 or the computer readable medium 332.

Although exemplary decoding device 300 has been described as generally conforming to a conventional general purpose computing device, decoding device 300 may be any of a number of devices capable of decoding video, such as a video recording device, a video co-processor and/or accelerator, a personal computer, a gaming console, a set-top box, a handheld or wearable computing device, a smart phone, or any other suitable device.

For example, the decoding apparatus 300 may operate according to an on-demand media service. In at least one exemplary embodiment, the on-demand media service may provide digital copies of media compositions, such as video content, to a user operating decoding device 300 on a per composition and/or subscription basis. The decoding device may obtain a digital copy of such a media work from the unencoded video source 108 via the network 104 by, for example, the encoding device 200.

Software implemented interframe video encoder

Fig. 4 illustrates a general functional block diagram of a software-implemented inter-frame video encoder 400 (hereinafter "encoder 400") employing residual transform techniques in accordance with at least one embodiment. One or more uncoded video frames (vidfrms) of the video sequence may be provided to the sequencer 404 in display order.

Sequencer 404 may assign a predictive coded picture type (e.g., I, P or B) to each unencoded video frame and reorder the frame sequence or groups of frames in the frame sequence into a coding order for motion prediction purposes (e.g., I-type frame followed by P-type frame, then B-type frame). The sequenced unencoded video frames (seqfrms) may then be input to block indexer 408 in coding order.

For each of the sequenced unencoded video frames (seqfrms), block indexer 408 may determine a largest coding block ("LCB") size (e.g., sixty-four by sixty-four pixels) for the current frame and partition the unencoded frame into an array of coding blocks (blcks). The size of the individual coded blocks within a given frame may vary, for example from four by four pixels up to the LCB size of the current frame.

Each coding block may then be input to the differentiator 412 one at a time and may be differentiated from a corresponding prediction signal block (pred) generated from a previously coded coding block. To generate the prediction block (pred), the encoded blocks (blcks) are also provided to the intra predictor 444 and the motion estimator 416. After differencing at differentiator 412, the resulting residual block (res) may be forward transformed into a frequency domain representation by transformer 420 (discussed below), resulting in a block of transform coefficients (tcof). The block of transform coefficients (tcof) may then be sent to a quantizer 424, producing a block of quantized coefficients (qcf), which may then be sent to an entropy encoder 428 and a local decoding loop 430.

For intra-coded blocks, the intra predictor 444 provides a prediction signal that represents previously coded regions of the same frame as the current coding block. For an inter-coded coding block, the motion compensated predictor 442 provides a prediction signal that represents a previously coded region of a different frame than the current coding block.

At the beginning of the local decoding loop 430, the inverse quantizer 432 may dequantize the transform coefficient blocks (cf ') and pass them to the inverse transformer 436 to generate dequantized residual blocks (res'). At adder 440, the predicted block (pred) from the motion compensated predictor 442 or the intra predictor 444 may be added to the dequantized residual block (res') to generate a locally decoded block (rec). The locally decoded blocks (rec) may then be sent to a frame combiner and deblocking filter processor 444, which may reduce blocking artifacts and combine recovered frames (recd), which may be used as reference frames for the motion estimator 416 and motion compensated predictor 442.

The entropy encoder 428 encodes the quantized transform coefficients (qcf), differential motion vectors (dmv), and other data to generate an encoded video bitstream 448. For each frame of the unencoded video sequence, the encoded video bitstream 448 may include encoded picture data (e.g., encoding quantized transform coefficients (qcf) and differential motion vectors (dmv)) and encoding frame headers (e.g., syntax information such as LCB size of the current frame).

Inter-frame coding mode

For coding blocks that are coded in inter-coding mode, motion evaluator 416 may divide each coding block into one or more prediction blocks, e.g., having a size such as 4 × 4 pixels, 8 × 8 pixels, 16 × 16 pixels, 32 × 32 pixels, or 64 × 64 pixels. For example, a 64 × 64 coding block may be divided into sixteen 16 × 16 prediction blocks, four 32 × 32 prediction blocks, or two 32 × 32 prediction blocks and eight 16 × 16 prediction blocks. Motion evaluator 416 may then calculate a motion vector (MVcalc) for each prediction block by identifying the appropriate reference block and determining the relative spatial displacement from the prediction block to the reference block.

According to an aspect of at least one embodiment, to improve coding efficiency, the calculated motion vector (MVcalc) may be coded by subtracting a motion vector predictor (MVpred) from the calculated motion vector (MVcalc) to obtain a motion vector difference (Δ MV). For example, if the calculated motion vector (MVcalc) is (5, -1) (i.e., a reference block from a previously encoded frame that is located in the right five columns and the row above with respect to the current prediction block in the current frame) and the motion vector predictor is (5, 0) (i.e., a reference block from a previously encoded frame that is located in the right five columns and the same row with respect to the current prediction block in the current frame), the motion vector difference (Δ MV) would be:

MV–MV=(5,-1)–(5,0)=(0,-1)=ΔMV.

The closer the motion vector predictor (MVpred) is to the calculated motion vector (MVcalc), the smaller the value of the motion vector difference (Δ MV). Thus, accurate motion vector prediction (which is made repeatable on the decoder side) independent of the content of the current prediction block may result in a significant reduction in the information required for motion vector differences compared to the calculated motion vectors over the course of the entire video sequence.

In accordance with an aspect of at least one embodiment, the motion evaluator 416 may use a variety of techniques to obtain a motion vector predictor (MVpred). For example, the motion vector predictor may be obtained by calculating the median of several previously coded motion vectors for a prediction block of the current frame. For example, the motion vector predictor may be a median of a plurality of previously coded reference blocks in the vicinity of the current prediction block, such as: a motion vector of a reference block (RBa) in a line on the same column and the current block; motion vectors of reference blocks (RBb) in a row on the right column and the current prediction block; and a motion vector of a reference block (RBc) in the left column and in the same row as the current block.

As described above, and in accordance with an aspect of at least one embodiment, the motion evaluator 416 may use additional or alternative techniques to provide a motion vector predictor for a prediction block in an inter-coding mode. For example, another technique for providing a motion vector predictor may be to determine an average of a plurality of previously coded reference blocks in the vicinity of the current prediction block, such as: a motion vector of a reference block (RBa) in a line on the same column and the current block; motion vectors of reference blocks (RBb) in a row on the right column and the current prediction block; and a motion vector of a reference block (RBc) in the left column and in the same row as the current block.

According to an aspect of at least one embodiment, to improve coding efficiency, the encoder 400 may indicate which available techniques are used in the coding of the current prediction block by setting a select motion vector prediction method (SMV-PM) flag in a picture header of the current frame (or a prediction block header of the current prediction block). For example, in at least one embodiment, the SMV-PM flag may be a one-bit variable having two possible values, one of which indicates that the motion vector predictor was obtained using the median technique described above, and a second of which indicates that the motion vector predictor was obtained using an alternative technique.

In a coded block coded in inter-coding mode, both motion vectors and residuals may be coded into the bitstream.

Skip coding and direct coding modes

for coding blocks that are coded in either skip coding or direct coding mode, motion evaluator 416 may use the entire coding block as the corresponding Prediction Block (PB).

According to an aspect of at least one embodiment, in skip coding and direct coding modes, instead of determining the calculated motion vector (MVcalc) for the Prediction Block (PB), the motion evaluator 416 may use a predefined method (described below with reference to fig. 7) to generate an ordered list of motion vector candidates. For example, for a current prediction block (PBcur), the ordered list of motion vector candidates may consist of motion vectors previously used to encode other blocks of the current frame, referred to as "reference blocks" (RBs).

In accordance with an aspect of at least one embodiment, the motion evaluator 416 may then select the best Motion Vector Candidate (MVC) from the ordered list to encode the current prediction block (PBcur). If the process for generating the ordered list of motion vector candidates is repeatable on the decoder side, only the index of the selected motion vector (MVsel) in the ordered list of motion vector candidates may be included in the coded bitstream, rather than in the motion vector itself. The information required to encode the index values may be much less than the actual motion vectors over the course of the entire video sequence.

According to an aspect of at least one embodiment, the motion vector selected to fill the motion vector candidate list is preferably taken from three reference blocks (RBa, RBb, RBc) having known motion vectors and sharing the boundary of the current prediction block (PBcur) and/or another Reference Block (RB). For example, the first reference block (RBa) may be located directly above the current prediction block (PBcur), the second reference block (RBb) may be located directly to the right of the first reference block (RBa), and the third reference block (RBc) may be located to the left of the current prediction block (RBc). However, the particular location of the reference block relative to the current prediction block may not be important as long as they are predefined so that downstream decoders can know where they are.

According to an aspect of at least one embodiment, if all three reference blocks have known motion vectors, a first motion vector candidate (MVC1) in the motion vector candidate list of the current prediction block (PBcur) may be a motion vector (MVa) with motion from the first reference block (RBa) (or a motion vector in a B-type frame), a second motion vector candidate (MVC2) may be a motion vector (MVb) (or multiple motion vectors) from the second reference block (RBb), and a third motion vector candidate (MVC3) may be a Motion Vector (MVC) (or multiple motion vectors) from the third reference block (RBc). The motion vector candidate list may thus be: (MVa, MVb, MVc).

However, if no Reference Block (RB) has any motion vector available, for example, because no prediction information is available for a given reference block or current prediction block (PBcur) in the top row, leftmost column, or rightmost column of the current frame, this motion vector candidate may be skipped, and the next motion vector candidate may be substituted, and the remaining candidate levels may be replaced with a zero value motion vector (0, 0). For example, if no motion vector is available for RBb, the motion vector candidate list may be: (MVa, MVc, (0, 0)).

In accordance with at least one embodiment, given various combinations of motion vector candidate availability, the complete set of combinations of motion vector candidate lists is shown in table 1:

TABLE 1

The motion evaluator 416 may then evaluate the motion vector candidates and select the best motion vector candidate to be used as the selected motion vector for the current prediction block. Note that this calculation can be repeated on the decoder side as long as the downstream decoder knows how to fill the ordered list of motion vector candidates for a given prediction block, without knowing the content of the current prediction block. Thus, the index of the motion vector selected from the motion vector candidate list, rather than the motion vector itself, only needs to be included in the encoded bitstream by, for example, setting a motion vector selection flag in the predictor header of the current prediction block, and therefore, significantly less information will be required to encode the index value than the actual motion vector over the course of the entire video sequence.

In the direct coding mode, a motion vector selection flag and a residual between the current prediction block and a block of a reference frame indicated by a motion vector are encoded. In the skip coding mode, the motion vector selection flag is coded, but coding of the residual signal is skipped. In essence, this tells the downstream decoder to use the block of the reference frame indicated by the motion vector in place of the current prediction block of the current frame.

Software implemented interframe decoder

Fig. 5 illustrates a general functional block diagram of a corresponding software-implemented inter-frame video decoder 500 (hereinafter "decoder 500") inverse residual transform technique according to at least one embodiment and suitable for use with a decoding device, such as decoding device 300. The decoder 500 may operate similarly to the local decoding loop 455 at the encoder 400.

in particular, the encoded video bitstream 504 to be decoded may be provided to an entropy decoder 508, which may decode quantized coefficients (qcf), differential motion vectors (dmv), accompanying message data packets (msg-data), and blocks of other data including prediction modes (intra or inter). The quantized coefficient block (qcf) may then be reorganized by the inverse quantizer 512, resulting in a restored transform coefficient block (tcof'). The recovered transform coefficient block (tcof ') may then be inverse transformed out of the frequency domain by inverse transformer 516 (described below), resulting in a decoded residual block (res'). Adder 520 may add a motion compensated prediction block (psb) obtained by using the corresponding motion vector (dmv) from motion compensated predictor 528. The resulting decoded video (dv) may be deblock filtered in a frame combiner and deblock filtering processor 524. The block (recd) at the output of the frame combiner and deblocking filter processor 524 forms a reconstructed frame of the video sequence that may be output from the decoder 500 and also used as a reference frame for the motion compensated predictor 528 for decoding subsequent encoded blocks.

motion vector selection routine

Fig. 6 illustrates a motion vector selection routine 600 suitable for use in at least one embodiment, such as the encoder 400. As one of ordinary skill in the art will recognize, not all events in the encoding process are shown in fig. 6. Rather, for clarity, only those steps reasonably relevant to describing the motion vector selection routine are shown.

At execution block 603, an encoded block is obtained, for example, by motion evaluator 416.

At decision block 624, the motion vector selection routine 600 selects an encoding mode for the encoding block. For example, as described above, an inter-coding mode, a direct coding mode, or a skip coding mode may be selected. If either skip coding or direct coding mode is selected for the current coding block, the motion vector selection routine 600 may proceed to an execution block 663 described below.

If an inter-coding mode is selected for the current coding block at decision block 624, at execution block 627, the motion vector selection routine 600 may divide the current coding block into one or more prediction blocks and begin at start loop block 630, each prediction block of the current coding block may be addressed in turn.

At execution block 633, the motion vector selection routine 600 may select a prediction index for the current prediction block that indicates whether the reference frame is a previous picture, a future picture, or both in the case of a B-type picture.

at execution block 636, the motion vector selection routine 600 may then select a motion vector prediction method, such as the median or mean techniques described above or any available alternative motion vector prediction method.

At execution block 642, the motion vector selection routine 600 may use the selected motion vector prediction method to obtain a motion vector predictor (MVpred) for the current prediction block.

At execution block 645, the motion vector selection routine 600 may obtain a calculated motion vector (MVcalc) for the current prediction block.

At execution block 648, the motion vector selection routine 600 may obtain a motion vector difference (Δ MV) for the current prediction block (note that there may be a single motion vector difference for P-type pictures and two motion vector differences for B-type pictures).

At execution block 651, the motion vector selection routine 600 may obtain the residual between the current prediction block (PBcur) relative to the block indicated by the calculated motion vector (MVcalc).

At execution block 654, the motion vector selection routine 600 may encode the motion vector difference and the residual of the current prediction block.

at execution block 657, the motion vector selection routine 600 may set an SMV-PM flag in the picture header of the current frame (or the prediction block header of the current prediction block) that indicates which motion vector prediction technique was used for the current prediction block.

At the end loop block 660, the motion vector selection routine 600 returns to the start loop block 630 to process the next prediction block (if any) of the current encoding block.

Returning to decision block 624, if skip coding or direct coding mode is selected for the current coding block, at execution block 663 the motion vector selection routine 600 sets the current prediction block equal to the current coding block.

The motion vector selection routine 600 may then call the motion vector candidate generation subroutine 700 (described below with reference to fig. 7), which may return an ordered list of motion vector candidates to the motion vector selection routine 600.

At execution block 666, the motion vector selection routine 600 may then select a motion vector from the motion vector candidate list for encoding the current prediction block.

At decision block 667, if the selected coding mode is direct coding, at execution block 669, the motion vector selection routine 600 computes the residual of the current prediction block between the reference block indicated by the selected motion vector.

At execution block 672, the motion vector selection routine 600 may encode the residual, and at execution block 675, the motion vector selection routine 600 may set a motion vector selection flag in the prediction block header of the current prediction block, the flag indicating which motion vector candidate to select for encoding the current prediction block.

The motion vector selection routine 600 ends at termination block 699.

Motion vector candidate generation subroutine 700

Fig. 7 depicts a motion vector candidate generation subroutine 700 for generating an ordered list of motion vector candidates in accordance with at least one embodiment. In the illustrated embodiment, three motion vector candidates are generated. However, those of ordinary skill in the art will recognize that a greater or lesser number of candidates may be generated using the same techniques, and further, that alternate and/or equivalent implementations may be substituted for the specific embodiments shown and described without departing from the scope of the present disclosure.

The motion vector candidate generation subroutine 700 obtains a request to generate a motion vector candidate list for the current prediction block at execution block 704.

at decision block 708, if a motion vector is available from the first candidate reference block (RBa), at execution block 712, the motion vector candidate generation subroutine 700 may set the first motion vector candidate (MVC1) to MVa, and proceeds to decision block 716.

At decision block 716, if a motion vector is available from the second candidate reference block (RBa), at execution block 724 the motion vector candidate generation subroutine 700 may set the second motion vector candidate (MVC2) to MVb and proceed to decision block 728.

At decision block 728, if a motion vector is available from the third candidate block (RBa), the motion vector candidate generation subroutine 700 may set the third motion vector candidate (MVC3) to MVC at execution block 736.

The motion vector candidate generation subroutine 700 may then return a motion vector candidate list having respective values of MVC1 ═ MVa, MVC2 ═ MVb, and MVC3 ═ MVC at a return block 799.

referring again to decision block 728, if a motion vector is not available from the third candidate block (RBc), the motion vector candidate generation subroutine 700 may set the third motion vector candidate (MVC3) to (0,0) at execution block 740.

The motion vector candidate generation subroutine 700 may then return a motion vector candidate list having respective values of MVC1 ═ MVa, MVC2 ═ MVb, and MVC3 ═ 0,0 at the return block 799.

referring again to decision block 716, if a motion vector is not available from the second candidate block (RBb), the motion vector candidate generation subroutine 700 may proceed to decision block 732.

At decision block 732, if a motion vector is available from the third candidate reference block (RBc), the motion vector candidate generation subroutine 700 may set the second motion vector candidate (MVC2) to MVC at execution block 744. The third motion vector candidate (MVC3) may then be set to (0,0) at execution block 740.

The motion vector candidate generation subroutine 700 may then return a motion vector candidate list having respective values of MVC1 ═ MVa, MVC2 ═ MVC, and MVC3 ═ 0,0 at the return block 799.

referring again to the decision block 732, if a motion vector is not available from the third candidate block (RBc), the motion vector candidate generation subroutine 700) may set the second motion vector candidate (MVC2) to (0,0) at execution block 748 and may set the third motion vector candidate (MVC3) to (0,0) at execution block 740.

The motion vector candidate generation subroutine 700 may then return a motion vector candidate list having respective values of MVC1 ═ MVa, MVC2 ═ 0, and MVC3 ═ 0,0 at the return block 799.

Referring again to decision block 708, if a motion vector is not available from the first candidate reference block (RBa), the motion vector candidate generation subroutine 700 may proceed to decision block 720.

At decision block 720, if a motion vector is available from the second candidate reference block (RBb), the motion vector candidate generation subroutine 700 may set the first motion vector candidate (MVC1) to MVb at execution block 752. The motion vector candidate generation subroutine 700 may then proceed to decision block 732.

Returning again to decision block 732, if a motion vector is available from the third candidate reference block (RBc), the motion vector candidate generation subroutine 700 may set the second motion vector candidate (MVC2) to MVC at execution block 744. The third motion vector candidate (MVC3) may then be set to (0,0) at execution block 740.

The motion vector candidate generation subroutine 700 may then return a motion vector candidate list having respective values of MVC1 ═ MVb, MVC2 ═ MVC, and MVC3 ═ 0,0 at the return block 799.

Referring again to the decision block 732, if a motion vector is not available from the third candidate block (RBc), the motion vector candidate generation subroutine 700) may set the second motion vector candidate (MVC2) to (0,0) at execution block 748 and may set the third motion vector candidate (MVC3) to (0,0) at execution block 740.

The motion vector candidate generation subroutine 700 may then return a motion vector candidate list having respective values of MVC1 ═ MVb, MVC2 ═ 0, and MVC3 ═ 0,0 at the return block 799.

referring again to decision block 720, if a motion vector is not available from the second candidate reference block (RBb), the motion vector candidate generation subroutine 700 may proceed to decision block 756.

At decision block 756, if a motion vector is available from the third candidate reference block (RBc), the motion vector candidate generation subroutine 700 may set the first motion vector candidate (MVC1) to MVC at execution block 760. The motion vector candidate generation subroutine 700 may then set the second motion vector candidate (MVC2) to (0,0) at execution block 748 and set the third motion vector candidate (MVC3) to (0,0) at execution block 740.

The motion vector candidate generation subroutine 700 may then return a motion vector candidate list having respective values of MVC1 ═ MVC, MVC2 ═ 0, and MVC3 ═ 0,0 at the return block 799.

Referring again to decision block 756, if a motion vector is not available from the third candidate reference block (RBc), the motion vector candidate generation subroutine 700 may set the first motion vector candidate (MVC1) to (0,0) at execution block 764. The motion vector candidate generation subroutine 700 may then set the second motion vector candidate to (0,0) at execution block 748 and set the third motion vector candidate to (0,0) at execution block 740.

The motion vector candidate generation subroutine 700 may then return a motion vector candidate list having respective values of MVC1 ═ MVb, MVC2 ═ 0, and MVC3 ═ 0,0 at the return block 799.

motion vector recovery routine 800

fig. 8 illustrates a motion vector recovery routine 800 suitable for use in at least one embodiment, such as the decoder 500. As one of ordinary skill in the art will recognize, not all events in the decoding process are shown in fig. 8. Rather, for clarity, only those steps reasonably relevant to describing the motion vector selection routine are shown.

at execution block 803, the motion vector recovery routine 800 may obtain data corresponding to the encoded block.

At execution block 828, the motion vector recovery routine 800 may identify a coding mode for encoding the encoded block. As described above, the possible coding modes may be inter-coding modes, direct coding modes, or skip coding modes.

At decision block 830, if the encoded block was encoded using inter-coding mode, at execution block 833, the motion vector recovery routine 800 may identify a corresponding prediction block for the encoded block. At a start loop block 836, each prediction block of the current coding block may be addressed in turn.

At the execution block 839, the motion vector recovery routine 800 may identify the prediction index of the current prediction block from the prediction block header.

At execution block 842, the motion vector recovery routine 800 may identify a motion vector prediction method for predicting the motion vector of the current prediction block, e.g., by reading the SMV-PM flag in the picture header of the current frame.

At execution block 848, the motion vector recovery routine 800 may obtain a motion vector difference (Δ MV) for the current prediction block.

at execution block 851, the motion vector recovery routine 800 may use the motion vector prediction method identified in execution block 842 to obtain the predicted motion vector (MVpred) for the current prediction block.

At execution block 854, the motion vector recovery routine 800 may recover the calculated motion vector (MVcalc) for the current prediction block, e.g., by adding the predicted motion vector (MVpred) to the motion vector difference (Δ MV) (note that there may be a single recovered motion vector for P-type pictures and two recovered motion vectors for B-type pictures).

At execution block 857, the motion vector restoration routine 800 may then add the residual of the current prediction block to the block indicated by the calculated motion vector (MVcalc) to obtain the restored value of the prediction block.

Referring again to decision block 830, if the current coding block is coded using either the skip coding mode or the direct coding mode, the motion vector recovery routine 800 may then invoke the motion vector candidate generation subroutine 700 (described above with reference to fig. 7), which may return an ordered list of motion vector candidates to the motion vector recovery routine 800.

At execution block 863, the motion vector recovery routine 800 may then read a motion vector select flag from the prediction block header at execution block 863.

at execution block 866, the motion vector recovery routine 800 may then use the motion vector select flag to identify a motion vector from the ordered list of motion vector candidates for encoding the current prediction block.

at decision block 869, if the current coding block is coded in direct coding mode, at execution block 872, the motion vector restoration routine 800 may add the residual of the prediction block to the coefficients of the block identified by the selected motion vector to restore the prediction block coefficients.

If the current coding block is coded in skip coding mode, the motion vector recovery routine 800 may use the coefficients of the reference block indicated by the selected motion vector as the coefficients of the prediction block at execution block 875.

The motion vector recovery routine 800 ends at termination block 899.

Alternative motion vector selection routine for skip coding and direct coding modes

Referring again to fig. 4, for coding blocks coded in either skip coding or direct coding mode, motion evaluator 416 may use the entire coding block as the corresponding Prediction Block (PB).

According to an aspect of at least one embodiment, in skip coding and direct coding modes, instead of determining the calculated motion vector (MVcalc) for the Prediction Block (PB), the motion evaluator 416 may use a predefined method to generate an ordered list of four Motion Vector Candidates (MVCL). For example, for a current prediction block (PBcur), the ordered list of motion vector candidates may consist of motion vectors and/or zero value motion vectors that were previously used to encode other blocks of the current frame, referred to as "reference blocks" (RBs).

In accordance with an aspect of at least one embodiment, the motion evaluator 416 may then select the best Motion Vector Candidate (MVC) from the ordered list to encode the current prediction block (PBcur). If the process for generating the ordered list of motion vector candidates is repeatable on the decoder side, only the index of the selected motion vector (MVsel) in the ordered list of motion vector candidates may be included in the coded bitstream, rather than in the motion vector itself. The information required to encode the index values may be much less than the actual motion vectors over the course of the entire video sequence.

According to an aspect of at least one embodiment, the motion vector selected to fill the motion vector candidate list is preferably taken from seven reference blocks (RBa, RBb, RBc, RBd, RBe, RBf, RBg) having known motion vectors and sharing the boundaries and/or vertices of the current prediction block (PBcur). Referring to FIG. 9, there is shown an 8 × 8 prediction block 902 as a current prediction block (PBcur), e.g., having an upper left pixel 904, an upper right pixel 906, and a lower left pixel 908:

(a) The first reference block (RBa) may be a predicted block containing pixels 910 to the left of the pixels 904;

(b) The second reference block (RBb) may be a prediction block containing pixels 912 above the pixel 904;

(c) The third reference block (RBc) may be a prediction block containing pixels 914 above and to the right of pixel 906;

(d) The fourth reference block (RBd) may be a prediction block containing pixels 916 below and to the left of pixel 908;

(e) The fifth reference block (RBe) may be a predicted block of pixels 918 that contain the left side of the pixels 908;

(f) The sixth reference block (RBf) may be a prediction block containing pixels 920 above pixels 906; and is

(g) The seventh reference block (RBg) may be a prediction block that includes pixel 922 above and to the left of pixel 904.

However, the particular location of the reference block relative to the current prediction block may not be important as long as they are known by the downstream decoder.

According to an aspect of the present embodiment, if all three reference blocks have known motion vectors, the first motion vector candidate (MVC1) in the motion vector candidate list of the current prediction block (PBcur) may be a motion vector (MVa) moving from the first reference block (RBa) (or a motion vector in a B-type frame), the second motion vector candidate (MVC2) may be a motion vector (MVb) (or multiple motion vectors) from the second reference block (RBb), the third motion vector candidate (MVC3) may be a Motion Vector (MVC) (or multiple motion vectors) from the third reference block (RBc), and the motion vector candidate (MVC4) in the motion vector candidate list of the current prediction block (PBcur) may be a motion vector (MVd) from the fourth reference block (RBd) (or a motion vector in a B-type frame).

According to the present embodiment, if one or more of the first four reference blocks (RBa-d) cannot provide motion vector candidates, three more reference blocks (RBe-g) may be considered. However, if none of the other three reference blocks (RBe-g) has a motion vector available, for example, because no prediction information is available for a given reference block or the current prediction block (PBcur) is in the top row, bottom row, left column, or right-most column of the current frame, this motion vector candidate may be skipped, and the next motion vector candidate may be replaced, and the remaining candidate levels may be replaced with a zero value motion vector (0, 0). For example, if no motion vector is available for the second, third and fourth reference blocks RBb-d, the motion vector candidate list may be: (MVa, MVe, (0, 0)). An exemplary procedure for populating a motion vector candidate list according to the present embodiment is described below with reference to fig. 10.

The motion evaluator 416 may then evaluate the motion vector candidates and select the best motion vector candidate to be used as the selected motion vector for the current prediction block. Note that this calculation can be repeated on the decoder side as long as the downstream decoder knows how to fill the ordered list of motion vector candidates for a given prediction block, without knowing the content of the current prediction block. Thus, the index of the motion vector selected from the motion vector candidate list, rather than the motion vector itself, only needs to be included in the encoded bitstream by, for example, setting a motion vector selection flag in the predictor header of the current prediction block, and therefore, significantly less information will be required to encode the index value than the actual motion vector over the course of the entire video sequence.

In the direct coding mode, a motion vector selection flag and a residual between the current prediction block and a block of a reference frame indicated by a motion vector are encoded. In the skip coding mode, the motion vector selection flag is coded, but coding of the residual signal is skipped. In essence, this tells the downstream decoder to use the block of the reference frame indicated by the motion vector in place of the current prediction block of the current frame.

Alternative motion vector candidate generation subroutine 1000

Fig. 10A-10B illustrate a motion vector candidate generation subroutine 1000 for generating an ordered list of motion vector candidates in accordance with at least one embodiment. In the illustrated embodiment, three motion vector candidates are generated. However, those of ordinary skill in the art will recognize that a greater or lesser number of candidates may be generated using the same techniques, and further, that alternate and/or equivalent implementations may be substituted for the specific embodiments shown and described without departing from the scope of the present disclosure.

The alternative motion vector candidate generation subroutine 1000 obtains a request to generate a motion vector candidate list for the current prediction block at execution block 1003.

The substitute motion vector candidate generation subroutine 1000 sets the index value (i) to zero at execution block 1005.

at decision block 1008, if the first candidate reference block (RBa) does not have an available motion vector (MVa), the alternative motion vector candidate generation subroutine 1000 proceeds to decision block 1015; if the first candidate reference block (RBa) does have a motion vector (MVa) available, the alternative motion vector candidate generation subroutine 1000 proceeds to execution block 1010.

The alternative motion vector candidate generation subroutine 1000 assigns the motion vector (MVa) of the first candidate reference block to the i-th motion vector candidate (MCVL [ i ]) in the motion vector candidate list at execution block 1010.

The replacement motion vector candidate generation subroutine 1000 increments the index value (i) at execution block 1013.

At decision block 1015, if the second candidate reference block (RBb) does not have a motion vector (MVb) available, the alternative motion vector candidate generation subroutine 1000 proceeds to decision block 1023; if the second candidate reference block (RBb) does have a motion vector (MVb) available, the alternative motion vector candidate generation subroutine 1000 proceeds to execution block 1018.

The alternative motion vector candidate generation subroutine 1000 assigns the motion vector (MVb) of the second candidate reference block to the i-th motion vector candidate (MCVL [ i ]) in the motion vector candidate list at execution block 1018.

The alternative motion vector candidate generation subroutine 1000 increments the index value (i) at execution block 1020.

At decision block 1023, if the third candidate reference block (RBc) does not have an available motion vector (MVc), the alternative motion vector candidate generation subroutine 1000 proceeds to decision block 1030; if the third candidate reference block (RBc) does have a motion vector (MVc) available, the alternative motion vector candidate generation subroutine 1000 proceeds to execution block 1025.

The alternative motion vector candidate generation subroutine 1000 assigns the motion vector (MVc) of the third candidate reference block to the i-th motion vector candidate (MCVL [ i ]) in the motion vector candidate list at execution block 1023.

The alternative motion vector candidate generation subroutine 1000 increments the index value (i) at execution block 1025.

at decision block 1030, if the fourth candidate reference block (RBd) does not have an available motion vector (MVd), the alternative motion vector candidate generation subroutine 1000 proceeds to decision block 1038; if the fourth candidate reference block (RBd) does have a motion vector (MVd) available, the alternative motion vector candidate generation subroutine 1000 proceeds to execution block 1033.

the alternative motion vector candidate generation subroutine 1000 assigns the motion vector (MVd) of the fourth candidate reference block to the i-th motion vector candidate (MCVL [ i ]) in the motion vector candidate list at execution block 1033.

The alternative motion vector candidate generation subroutine 1000 increments the index value (i) at execution block 1035.

At decision block 1038, if the index value (i) is less than four, indicating that fewer than four motion vector candidates have been identified so far in the replacement motion vector candidate generation subroutine 1000, and the fifth candidate reference block (RBe) has available motion vectors (MVe), the replacement motion vector candidate generation subroutine 1000 proceeds to execute block 1040; otherwise, the alternative motion vector candidate generation subroutine 1000 proceeds to decision block 1045.

The substitute motion vector candidate generation subroutine 1000 assigns the motion vector (MVe) of the fifth candidate reference block to the i-th motion vector candidate (MCVL [ i ]) in the motion vector candidate list at execution block 1040.

The alternative motion vector candidate generation subroutine 1000 increments the index value (i) at execution block 1043.

At decision block 1045, if the index value (i) is less than four and the sixth candidate reference block (RBf) has an available motion vector (MVf), the alternative motion vector candidate generation subroutine 1000 proceeds to execute block 1048; otherwise, the alternative motion vector candidate generation subroutine 1000 proceeds to decision block 1053.

The substitute motion vector candidate generation subroutine 1000 assigns the motion vector (MVf) of the sixth candidate reference block to the i-th motion vector candidate (MCVL [ i ]) in the motion vector candidate list at the execution block 1048.

The alternative motion vector candidate generation subroutine 1000 increments the index value (i) at execution block 1050.

At decision block 1053, if the index value (i) is less than four and the seventh candidate reference block (RBg) has a motion vector available (MVg), the alternative motion vector candidate generation subroutine 1000 proceeds to execute block 1055; otherwise, the alternative motion vector candidate generation subroutine 1000 proceeds to decision block 1060.

The substitute motion vector candidate generation subroutine 1000 assigns the motion vector (MVg) of the seventh candidate reference block to the i-th motion vector candidate (MCVL [ i ]) in the motion vector candidate list at execution block 1055.

The alternative motion vector candidate generation subroutine 1000 increments the index value (i) at execution block 1058.

At decision block 1060, if the index value (i) is less than four, the alternative motion vector candidate generation subroutine 1000 proceeds to execution block 1063; otherwise, the alternative motion vector candidate generation subroutine 1000 proceeds to return block 1099.

The alternative motion vector candidate generation subroutine 1000 assigns a zero value motion vector to the i-th motion vector candidate (MCVL [ i ]) in the motion vector candidate list at execution block 1063.

The replacement motion vector candidate generation subroutine 1000 increments the index value (i) at execution block 1065 and then loops back to decision block 1060.

The substitute motion vector candidate generation subroutine 1000 returns a motion vector candidate list (MCVL) at a return block 1099.

Recursive coding block partitioning scheme

Fig. 11 illustrates an exemplary recursive encoding block partitioning scheme 1100 that may be implemented by the encoder 400 in accordance with various embodiments. At block indexer 408, after dividing the frame into LCB-sized pixel regions, hereinafter referred to as coded block candidates ("CBCs"), each LCB-sized coded block candidate ("LCBC") may be divided into smaller CBCs according to recursive coded block partitioning scheme 1100. This process may continue recursively until block indexer 408 determines that either (1) the current CBC is suitable for encoding (e.g., because the current CBC contains only pixels of a single value) or (2) the current CBC is the minimum size of a coded block candidate for a particular implementation, e.g., 2 x 2, 4 x 4, etc. ("MCBC"), whichever comes first. Chunk indexer 408 may then index the current CBC into encoded chunks suitable for encoding.

A square CBC 1102, such as an LCBC, may be partitioned along one or both of the vertical and horizontal lateral axes 1104, 1106. The partitioning along the vertical horizontal axis 1104 vertically partitions the square CBC 1102 into a first rectangular coding block structure 1108, as shown by rectangular (1:2) CBCs 1110 and 1112. The partitioning along the horizontal axis 1106 horizontally partitions the square CBC 1102 into a second rectangular code block structure 1114, as shown by the together truncated rectangular (2:1) CBCs 1116 and 1118.

A rectangular (2:1) CBC (such as CBC 1116) of the first rectangular coding structure 1114 may be partitioned into two rectangular coding block structures 1148, as shown by the together truncated rectangular CBCs 1150 and 1152.

the division along the horizontal 1106 and vertical 1104 lateral axes divides the square CBC 1102 into a square coded block structure 1120, as shown by the together truncated square CBCs 1122, 1124, 1126 and 1128.

The rectangle (1:2) CBC (such as CBC 1112) of the first rectangular coding block structure 1108 may be split along a horizontal axis 1130 into two first square coding block structures 1132, as shown by the together truncated square CBCs 1134 and 1136.

The rectangle (2:1) CBC (such as CBC 1118) of the second rectangular code structure 1114 may be partitioned into two second square code block structures 1138, as shown by the together truncated square CBCs 1140 and 1142.

the square CBC of the quad coding block structure 1120, the two first square coding block structures 1132 or the two second square coding block structures 1138 may be partitioned along one or both of the vertical and horizontal transverse axes of the coding blocks in the same manner as the CBC 1102.

For example, a coding block of size 64 × 64 bit LCBC may be partitioned into two 32 × 64 bit coding blocks, two 64 × 32 bit coding blocks, or four 32 × 32 bit coding blocks.

In the coded bitstream, a two-bit coded block split flag may be used to indicate whether the current coded block is further split:

Coded block partition flag value Segmentation type
00 Current coding block not partitioned
01 The current coding block is divided horizontally
10 The current coding block is vertically divided
11 The current coding block is divided horizontally and vertically

Code block indexing routine

FIG. 12 illustrates an exemplary encode block index routine 1200, such as may be performed by block indexer 408, in accordance with various embodiments.

The encode block index routine 1200 may obtain a frame of a video sequence at execution block 1202.

The encode block index routine 1200 may partition a frame into LCBCs at execution block 1204.

At start loop block 1206, the encode block index routine 1200 may process each LCBC in turn, e.g., starting with the LCBC in the upper left corner of the frame, then proceeding from left to right, and from top to bottom.

At subroutine block 1300, the code block indexing routine 1200 calls the code block segmentation subroutine 1300 described below with reference to fig. 13.

At the end loop block 1208, the encode block index routine 1200 loops back to the start loop block 1206 to process the next LCBC (if any) for the frame.

The encode block index routine 1200 ends at return block 1299.

Code block partitioning subroutine

Fig. 13 illustrates an exemplary encoded block segmentation subroutine 1300, such as may be performed by block indexer 408, in accordance with various embodiments.

Subroutine 1300 obtains the CBC at execution block 1302. The encoded block candidates may be provided from routine 1400 or recursively, as described below.

At decision block 1304, if the obtained CBC is MCBC, then the encode block partition subroutine 1300 may proceed to execute block 1306; otherwise, the encode block partition subroutine 1300 may proceed to execute block 1308.

The encoded block partitioning subroutine 1300 may index the obtained CBC into encoded blocks at execution block 1306. The encoded block segmentation subroutine 1300 may then terminate at a return block 1398.

The encode block partition subroutine 1300 may test the encoding suitability of the current CBC at execution block 1308. For example, coding block segmentation subroutine 1300 may analyze the pixel values of the current CBC and determine whether the current CBC contains only pixels of a single value or whether the current CBC matches a predefined pattern.

At decision block 1310, if the current CBC is suitable for encoding, the encode block partitioning subroutine 1300 may proceed to execute block 1306; otherwise, the encoded block segmentation subroutine 1300 may proceed to decision block 1314.

The encoded block partition subroutine 1300 may select an encoded block partition structure for the current square CBC at execution block 1314. For example, the coding block partitioning subroutine 1300 may select between a first rectangular coding block structure 1108, a second rectangular coding structure 1114, or a square coding block structure 1120 of the recursive coding block partitioning scheme 1100 described above with reference to fig. 11.

The encoded block splitting subroutine 1300 may split the current CBC into two or four sub-CBCs at execution block 1316 according to the recursive encoded block splitting scheme 1100.

At start loop block 1318, the encode block split subroutine 1300 may process each sub-CBC in turn that results from performing the splitting procedure of block 1316.

At subroutine block 1300, the encode block partition subroutine 1300 may call itself to process the current sub-CBC in the manner currently described.

At end loop block 1320, the encode block partition subroutine 1300 loops back to the start loop block 1318 to process the next sub-CBC (if any) of the current CBC.

the encoded block segmentation subroutine 1300 may then terminate at a return block 1399.

Coding block tree partitioning procedure

Fig. 14A-14C illustrate an exemplary coded block tree partitioning procedure 1400 that applies the coded block partitioning scheme 1100 to a "root" LCBC 1402. Fig. 14A illustrates various sub-encoding blocks 1404 and 1454 created by the encoding block tree splitting procedure 1400; FIG. 14B illustrates an encoding block tree splitting procedure as a tree data structure, showing parent/child relationships between various encoding blocks 1402-1454; figure 14C illustrates various "leaf node" child coding blocks of figure 14B indicated by dashed lines at corresponding locations within the configuration of the root coding block 1402.

Assuming that the 64 x 64LCBC 1402 is not suitable for encoding, it may be partitioned into a first rectangular coded block structure 1108, a second rectangular coded structure 1114, or a square coded block structure 1120 of the recursive coded block partitioning scheme 1100 described above with reference to fig. 11. For the purposes of this example, assume that the 64 × 64LCBC 1402 is partitioned into two 32 × 64 sub-CBCs, a 32 × 64CBC 1404, and a 32 × 64CBC 1406. Each of these sub-CBCs may then be processed in turn.

Assuming that the first sub-32 × 64CBC 1404 of the 64 × 64LCBC 1402 is not suitable for encoding, it may be split into two 32 × 32 sub-encoded block candidates 32 × 32CBC 1408 and 32 × 32CBC 1410. Each of these sub-CBCs may then be processed in turn.

Assuming that the first sub-32 × 32CBC 1408 of the 32 × 64LCBC 1404 is not suitable for encoding, it may be split into two 16 × 32 sub-encoded block candidates 16 × 32CBC 1412 and 16 × 32CBC 1414. Each of these sub-CBCs may then be processed in turn.

The encoder 400 may determine that the first sub-16 × 32CBC 1412 of the 32 × 32CBC 1408 is suitable for encoding; the encoder 400 may thus index the 16 × 32CBC 1412 as an encoded block 1413 and return to the parent 32 × 32CBC 1408 to process its next child (if any).

assuming that the second sub-16 × 32CBC 1414 of the 32 × 32LCBC 1408 is not suitable for encoding, it may be partitioned into two 16 × 16 sub-encoded block candidates 16 × 16CBC 1416 and 16 × 16CBC 1418. Each of these sub-CBCs may then be processed in turn.

Assuming that the first sub-16 × 16CBC 1416 of the 16 × 32LCBC 1414 is not suitable for encoding, it may be partitioned into two 8 × 16 sub-encoded block candidates 8 × 16CBC 1420 and 8 × 16CBC 1422. Each of these sub-CBCs may then be processed in turn.

The encoder 400 may determine that a first sub-8 × 16CBC 1420 of the 16 × 16CBC 1416 is suitable for encoding; the encoder 400 may thus index the 8 × 16CBC 1420 as an encoded block 1421 and return to the parent 16 × 16CBC 1416 to process its next child (if any).

The encoder 400 may determine that a second sub-8 × 16CBC 1422 of the 16 × 16CBC 1416 is suitable for encoding; the encoder 400 may thus index the 8 × 16CBC 1422 into an encoded block 1423 and return to the parent 16 × 16CBC 1416 to process its next child (if any).

All of the children of the 16 × 16CBC 1416 have now been processed, indexing the 8 × 16 coded blocks 1421 and 1423. The encoder 400 may thus return to the parent 16 x 32CBC 1414 to process its next child (if any).

Assuming that the second sub-16 × 16CBC 1418 of the 16 × 32LCBC 1414 is not suitable for encoding, it may be partitioned into two 8 × 16 encoded block candidates 8 × 16CBC 1424 and 8 × 16CBC 1426. Each of these sub-CBCs may then be processed in turn.

Assuming that the first sub-8 × 16CBC 1424 of the 16 × 16LCBC 1418 is not suitable for encoding, it may be split into two 8 × 8 encoded block candidates 8 × 8CBC 1428 and 8 × 8CBC 1430. Each of these sub-CBCs may then be processed in turn.

The encoder 400 may determine that a first sub-8 × 8CBC 1428 of the 8 × 16CBC 1424 is suitable for encoding; the encoder 400 may thus index the 8 × 8CBC 1428 into an encoded block 1429 and return to the parent 8 × 16CBC 1424 to process its next child (if any).

The encoder 400 may determine that the second sub-8 × 8CBC1430 of the 8 × 16CBC 1424 is suitable for encoding; the encoder 400 may thus index the 8 × 8CBC1430 as an encoded block 1431 and return to the parent 8 × 16CBC 1424 to process its next child (if any).

All children of the 8 x 16CBC 1424 have now been processed, indexing 8 x 8 coded blocks 1429 and 1431. The encoder 400 may thus return to the parent 16 × 16CBC 1418 to process its next child (if any).

The encoder 400 may determine that a second sub-16 × 16CBC 1426 of the 8 × 16CBC 1418 is suitable for encoding; the encoder 400 may thus index the 8 × 16CBC 1426 into an encoded block 1427 and return to the parent 16 × 16CBC 1418 to process its next child (if any).

All the children of the 16 × 16CBC 1418 have now been processed, indexing 8 × 8 coded blocks 1429 and 1431 and 8 × 16 coded block 1427. The encoder 400 may thus return to the parent 16 x 32CBC 1414 to process its next child (if any).

All the children of the 16 x 32CBC 1414 have now been processed so that the 8 x 8 coded blocks 1429 and 1431, the 8 x 16 coded blocks 1421, 1423, and 1427 are indexed. The encoder 400 may thus return to the parent 32 x 32CBC 1408 to process its next child (if any).

All the children of the 32 x 32CBC 1408 have now been processed so that the 8 x 8 coded blocks 1429 and 1431, the 8 x 16 coded blocks 1421, 1423 and 1427, and the 16 x 32 coded block 1413 are indexed. The encoder 400 may thus return to the parent 32 x 64CBC 1404 to process its next child (if any).

The encoder 400 may determine that the second sub-32 × 32CBC 1410 of the 32 × 64CBC 1404 is suitable for encoding; the encoder 400 may thus index the 32 × 32CBC 1410 into an encoded block 1411 and return to the parent 32 × 64CBC 1404 to process its next child (if any).

All the children of the 32 x 64CBC 1404 have now been processed so that the 8 x 8 coded blocks 1429 and 1431, the 8 x 16 coded blocks 1421, 1423, and 1427, and the 16 x 32 coded blocks 1411 and 32 x 32 coded blocks 1411 have been indexed. The encoder 400 may thus return to the parent root 64 x 64LCBC 1402 to process its next child (if any).

Assuming that the second sub-32 × 64CBC 1406 of the 64 × 64LCBC 1402 is not suitable for encoding, it may be split into two 32 × 32 encoded block candidates 32 × 32CBC 1432 and 32 × 32CBC 1434. Each of these sub-CBCs may then be processed in turn.

assuming that the first sub-32 × 32CBC 1432 of the 32 × 64LCBC 1406 is not suitable for encoding, it may be split into two 32 × 16 encoded block candidates 32 × 16CBC 1436 and 32 × 16CBC 1438. Each of these sub-CBCs may then be processed in turn.

The encoder 400 may determine that a first sub 32 x 16CBC 1436 of the 32 x 32CBC 1432 is suitable for encoding; the encoder 400 may thus index the 32 x 16CBC 1436 into an encoded block 1437 and return to the parent 32 x 32CBC 1432 to process its next child (if any).

The encoder 400 may determine that a second sub-32 x 16CBC 1438 of the 32 x 32CBC 1432 is suitable for encoding; the encoder 400 may thus index the 32 x 16CBC 1438 into an encoded block 1439 and return to the parent 32 x 32CBC 1432 to process its next child (if any).

All of the children of the 32 x 32CBC 1432 have now been processed, indexing the 32 x 16 coding blocks 1437 and 1439. The encoder 400 may thus return to the parent 32 x 64CBC 1406 to process its next child (if any).

Assuming that the second sub-32 × 32CBC 1434 of the 32 × 64LCBC 1406 is not suitable for encoding, it may be split into two 16 × 16 encoded block candidates 16 × 16CBC 1440 and 16 × 16CBC 1446. Each of these sub-CBCs may then be processed in turn.

The encoder 400 may determine that the first sub-16 × 16CBC 1440 of the 32 × 32CBC 1434 is suitable for encoding; the encoder 400 may thus index the 16 × 16CBC 1440 into an encoded block 1441 and return to the parent 32 × 32CBC 1434 to process its next child (if any).

The encoder 400 may determine that the second sub-16 × 16CBC 1442 of the 32 × 32CBC 1434 is suitable for encoding; the encoder 400 may thus index the 16 × 16CBC 1442 into an encoded block 1443 and return to the parent 32 × 32CBC 1434 to process its next child (if any).

Assuming that the third sub-16 × 16CBC 1444 of the 32 × 32LCBC 1406 is not suitable for encoding, it may be split into two 8 × 8 encoded block candidates 8 × 8CBC 1448, 8 × 8CBC1452 and 8 × 8CBC 1454. Each of these sub-CBCs may then be processed in turn.

The encoder 400 may determine that a first sub-16 × 16CBC 1448 of the 32 × 32CBC 1444 is suitable for encoding; the encoder 400 may thus index the 8 × 8CBC 1448 into an encoded block 1449 and return to the parent 16 × 16CBC 1444 to process its next child (if any).

The encoder 400 may determine that the second sub-8 × 8CBC 1450 of the 16 × 16CBC 1444 is suitable for encoding; encoder 400 may thus index 8 × 8CBC 1450 as encoded block 1451 and return to parent 16 × 16CBC 1444 to process its next child (if any).

The encoder 400 may determine that the third sub-8 × 8CBC1452 of the 16 × 16CBC 1444 is suitable for encoding; encoder 400 may thus index 8 × 8CBC1452 into encoded block 1453 and return to parent 16 × 16CBC 1444 to process its next child (if any).

The encoder 400 may determine that the fourth sub-8 × 8CBC 1454 of the 16 × 16CBC 1444 is suitable for encoding; encoder 400 may thus index 8 × 8CBC 1454 into encoded block 1455 and return to parent 16 × 16CBC 1444 to process its next child (if any).

All the children of the 16 × 16CBC 1444 have now been processed, resulting in 8 × 8 encoded blocks 1449, 1451, 1453, and 1455. The encoder 400 may thus return to the parent 32 x 32CBC 1434 to process its next child (if any).

the encoder 400 may determine that the fourth sub-16 × 16CBC 1446 of the 32 × 32CBC 1434 is suitable for encoding; the encoder 400 may thus index the 16 × 16CBC 1446 into an encoded block 1447 and return to the parent 32 × 32CBC 1434 to process its next child (if any).

all the children of the 32 x 32CBC 1434 have now been processed, indexing the 16 x 16 coded blocks 1441 and 1447, the 8 x 8 coded blocks 1449, 1451, 1453, and 1455. The encoder 400 may thus return to the parent 32 x 64CBC 1406 to process its next child (if any).

All the children of the 32 x 64CBC 1406 have now been processed so that the 32 x 16 coding blocks 1437 and 1439, the 16 x 16 coding blocks 1441, 1443, and 1447, and the 8 x 8 coding blocks 1449, 1451, 1453, and 1455 have been indexed. The encoder 400 may thus return to the parent root 64 x 64LCBC 1402 to process its next child (if any).

All children of the root 64 x 64LCBC 1402 have now been processed, encoding blocks 1429, 1431, 1449, 1451, 1453, and 1455 for 8 x 8; 8 × 16 code blocks 1421, 1423, and 1427; 16 × 32 coding blocks 1441 and 32 × 32 coding blocks 1411; 32 × 16 coding blocks 1437 and 1439; and 16 × 16 encoded blocks 1441, 1443, and 1447 are indexed. The encoder 400 may thus proceed to the next LCBC (if present) of the frame.

template matching prediction selection techniques

In accordance with aspects of various embodiments of the present methods and systems, to select an intra predictor for a rectangular coding block, encoder 400 may attempt to match a prediction boundary template for the rectangular coding block with an encoded portion of the current video frame. The prediction boundary template is an L-shaped region of pixels above and to the left of the current coding block.

Fig. 15A-15B show two pixel regions 1500A, 1500B corresponding to a portion of a video frame. Regions of pixels 1500A-B are shown partially encoded, each region having a processed region 1502A-B, an unprocessed region 1504A-B (indicated by single hatching), and a current encoded block 1506A-B (indicated by double hatching). Processed regions 1502A-B represent pixels that have been indexed into encoded blocks by block indexer 408 and have been processed by intra predictor 444 or motion compensated predictor 442. Unprocessed regions 1504A-B represent pixels that are not processed by the intra predictor 444. Current coding blocks 1506A-B are rectangular coding blocks currently being processed by intra predictor 444. (the size of the encoded blocks 1506A and 1506B is arbitrarily chosen-the current techniques may be applied to any rectangular encoded block in accordance with the present methods and systems-for illustrative purposes.) the pixels directly above and to the left of the encoded blocks 1506A-B form exemplary prediction templates 1508A-B. The prediction template is an arrangement of pixels near the current coding block that have been processed by either the intra predictor 444 or the motion compensated predictor 442 and thus have prediction values associated therewith. According to some embodiments, the prediction template may include pixels bordering pixels of the current encoding block. The spatial configuration of the prediction templates 1508A-B forms an "L" -shaped arrangement that borders the encoding blocks 1506A-B along the pixels on the upper and left sides of the encoding blocks (i.e., on both sides of the encoding blocks 1506A-B that border the processed region 1502A-B).

Fig. 16 illustrates how a prediction template may be used in accordance with the present methods and systems to select an intra-prediction value for a pixel of a rectangular coding block in an exemplary video frame 1600 that includes a region of pixels 1500A and, thus, a current coding block 1506A. Note that the size of the encoded block 1506A with respect to the video frame 1600 is exaggerated for illustrative purposes. The pixel region 1500A is shown both within the background of the video frame 1600 and in the lower right portion of fig. 16 in enlarged form. The second region of pixels, pixel region 1601, is displayed both within the video frame 1600 and enlarged in the lower left portion of fig. 16. The video frame 1600 also includes a processed region 1602 (including a processed region 1502A and a pixel region 1601); and untreated regions 1604 (including untreated regions 1504A).

According to the present method and system, to select a predictor for a pixel of coding block 1506A (or any rectangular coding block), encoder 400 may:

(1) Identifying a prediction template, such as exemplary prediction template 1508A, in processed area 1602;

(2) searching the processed region 1602 for pixel arrangements that match the prediction template 1508A in both relative spatial configuration and prediction value (assume, for purposes of this example, that an arbitrarily selected pixel arrangement 1606 within the pixel region 1601 matches the prediction template 1508A);

(3) Identifying the pixel region 1608 as a region having a relative spatial relationship to the matching arrangement of the current coding block relative to the prediction template; and is

(4) The respective predicted value for each pixel of the pixel region 1608 is mapped to the corresponding pixel of the currently encoded block.

In various embodiments, when determining whether there is a match between a prediction template, such as prediction templates 1508A-B, and a potentially matching arrangement of pixels (e.g., pixel arrangement 1606), encoder 400 may apply various tolerances to the matching algorithm, such as detecting a match: (a) only if the predicted values of the prediction template completely match the possible pixel matching arrangement; (b) only if all predicted values match +/-2%; (c) only if all predictors except one match exactly and the remaining ones match +/-5%; (d) only if all predictors except one match exactly and the remaining predictors match +/-5% or all predictors match +/-2% (i.e., the combination of (b) or (c)); (d) the prediction cost of the prediction template and the possible pixel matching arrangement are less than a predefined threshold (the prediction cost may be, for example, Sum of Absolute Differences (SAD), Sum of Squared Errors (SSE), or derived from a rate-distortion function); and the like.

In various embodiments, the matching algorithm may: (a) stopping processing the possible pixel matching arrangement after finding the tolerable pixel matching arrangement, and mapping the predicted value of the corresponding pixel area to the pixel of the current coding block; (b) processing all possible pixel matching arrangements, then selecting the best available pixel matching arrangement and mapping the predicted value of the corresponding pixel area to the pixel of the current coding block; (c) starting to process all possible pixel matching arrangements, stopping if a perfect match is found, mapping the predicted value of the corresponding pixel area to the pixel of the current coding block, otherwise, continuing to process all possible pixel matching arrangements, selecting the best available imperfect match, and mapping the predicted value of the corresponding pixel area to the pixel of the current coding block; and the like.

Rectangular coding block predictor selection routine

Fig. 17 illustrates an exemplary rectangular coding block predictor selection routine 1700 that may be implemented by the intra predictor 444 in accordance with various embodiments.

The rectangular coded block predictor selection routine 1700 may obtain a rectangular coded block at an execution block 1702. For example, the rectangular coding block predictor selection routine 1700 may obtain a pixel position within a frame, a coding block width size, and a coding block height size. The pixel location may correspond to a pixel in the upper right corner of the current coding block, the coding block width size may correspond to a number of pixel columns, and the coding block height size may correspond to a number of pixel rows.

The rectangular coding block predictor selection routine 1700 may select a prediction template for the rectangular coding block at execution block 1704. For example, the rectangular coding block predictor selection routine 1700 may select a prediction template that includes pixels bordering along the top and left sides of the current coding block, as described above with respect to fig. 15.

The rectangular coded block predictor selection routine 1700 may identify a search area in the current frame at execution block 1706. For example, the search area may include all pixels of the current frame having prediction values that have been allocated.

At subroutine block 1800, the rectangular coding block predictor selection routine 1700 calls the processed region search subroutine 1800, described below with reference to fig. 18. Subroutine block 1800 may return pixel regions or prediction failure errors.

At decision block 1708, if subroutine block 1800 returns a prediction failure error, rectangular coded block predictor selection routine 1700 may terminate with a failure at return block 1798; otherwise, the rectangular coding block predictor selection routine 1700 may proceed to a start loop block 1710.

At a start loop block 1710, the rectangular encoding block predictor selection routine 1700 may process each pixel of the rectangular encoding block in turn. For example, the rectangular coded block predictor selection routine 1700 may process pixels of a rectangular coded block from left to right and top to bottom.

The rectangular coding block predictor selection routine 1700 may map the predictor of the pixel region obtained from the processed region search subroutine 1800 to the current pixel of the rectangular coding block at execution block 1712. For example, a predicted value for a pixel in the top left corner of a pixel region may be mapped to a pixel in the top left corner of the current coding block, and so on.

At an end loop block 1714, the rectangular coding block predictor selection routine 1700 may loop back to the start loop block 1710 to process the next pixel (if any) of the rectangular coding block.

The rectangular coded block predictor selection routine 1700 may successfully terminate at a return block 1799.

Processed region search subroutine program

Fig. 18 illustrates an exemplary processed region search subroutine 1800 that may be implemented by the intra predictor 444, in accordance with various embodiments.

The processed area search subroutine 1800 may obtain a prediction template and a search area at execution block 1802.

The processed region search subroutine 1800 may select anchor pixels for the prediction template at execution block 1804. For example, if the prediction template is an L-shaped pixel arrangement along the top and left boundary of the encoded block, the anchor pixel may be the pixel at the "L" intersection, i.e., one pixel row above the pixel in the upper left corner of the encoded block and one pixel column to the left.

at start loop block 1806, the processed region search subroutine 1800 may process each pixel of the search region in turn.

The processed region search subroutine 1800 may generate a test template having the same arrangement as the prediction template, but using the current search region pixels as the anchor pixels of the test template.

At subroutine block 1900, the processed region search subroutine 1800 may call the template matching test subroutine 1900, described below with reference to FIG. 19. The template matching test subroutine 1900 may return a perfect match result, a possible match result, or a no match result.

at decision block 1810, if the template matching test subroutine 1900 returns a perfect match result, the processed region search subroutine 1800 may proceed to return block 1897 and return a region of pixels for which the current test template has the same relative spatial relationship to the current encoding block relative to the prediction template; otherwise, processed region search subroutine 1800 may proceed to decision block 1812.

At decision block 1812, if the template matching test subroutine 1900 returns a possible match result, the processed area search subroutine 1800 may proceed to execution block 1814; otherwise, the processed region search subroutine 1800 may proceed to end loop block 1816.

the processed region search subroutine 1800, at execution block 1814, may mark the test template associated with the current search region pixel as corresponding to a possible match.

At end loop block 1816, the processed region search subroutine 1800 may loop back to the start loop block 1806 to process the next pixel of the search region (if any).

At decision block 1818, if no test template is marked as a possible match; then the processed region search subroutine 1800 may terminate by returning a mismatch error at return block 1898; otherwise, processed region search subroutine 1800 may proceed to decision block 1820.

At decision block 1820, if multiple test templates are found to be possible matches at execution block 1814, the processed area search subroutine 1800 may proceed to execution block 1822; otherwise, i.e., only one test template is marked as a possible match, processed region search subroutine 1800 may proceed to return block 1899.

The processed region search subroutine 1800 may select the best matching test template of the identified possible matching test templates and discard the remaining identified possible matching test templates, leaving only one identified test template.

The processed region search subroutine 1800 may terminate at a return block 1899 by returning a region of pixels for which the test template has the same relative spatial relationship to the current encoding block relative to the prediction template.

Template matching test subroutine

Fig. 18 illustrates an exemplary template matching test subroutine 1900 that may be implemented by the intra predictor 444, in accordance with various embodiments.

The template matching test subroutine 1900 may obtain a test template and a prediction template at execution block 1902.

The template matching test subroutine 1900 may set the match variable to true at execution block 1904.

At start loop 1906, the template matching test subroutine 1900 may process each pixel of the test template in turn.

At decision block 1908, if the predicted value of the current test template pixel matches the predicted value of the corresponding predicted template pixel, then template matching test subroutine 1900 may proceed to end loop block 1912; otherwise, the template matching test subroutine 1900 may proceed to execute block 1910.

The template matching test subroutine 1900 may set the match variable to false at execution block 1910.

At the end loop block 1912, the template matching test subroutine 1900 may loop back to the start loop block 1906 to process the next pixel of the test template (if any).

At decision block 1914, if the value of the match variable is true, template matching test subroutine 1900 may return a perfect match result at return block 1997; otherwise, the template matching test subroutine 1900 may proceed to execute block 1916.

The template matching test subroutine 1900 may set the value of the match variable to true at execution block 1916.

At start loop 1918, template matching test subroutine 1900 may process each pixel of the test template in turn.

At decision block 1920, if the predicted value of the current test template pixel is within the predefined tolerance level of the predicted value of the corresponding prediction template pixel, then template matching test subroutine 1900 may proceed to end loop block 1924; otherwise, the template matching test subroutine 1900 may proceed to execute block 1922.

The template matching test subroutine 1900 may set the match variable to false at execution block 1922.

At an end loop block 1924, the template matching test subroutine 1900 may loop back to the start loop block 1906 to process the next pixel of the test template (if any).

at decision block 1926, if the value of the match variable is true, the template match test subroutine 1900 may terminate by returning a possible match result at return block 1998; otherwise, the template matching test subroutine 1900 may terminate by returning a mismatch result at return block 1999.

Directional prediction techniques

In accordance with aspects of various embodiments of the present methods and systems, to select an intra predictor for an encoded block, encoder 400 may attempt to map the predicted values that have been selected from pixels near the encoded block to pixels of the encoded block.

fig. 20A-20E show five pixel regions 2000A-E each corresponding to a portion of a video frame (not shown). Pixel regions 2000A-E are shown as being partially encoded, each region having a processed region 2002A-E, an unprocessed region 2004A-E (indicated by single hatching), and a currently encoded block 2006A-E. Processed regions 2002A-E represent pixels that have been indexed into an encoded block by block indexer 408 and that have been processed by intra predictor 444. Unprocessed regions 2004A-E represent pixels that are not processed by the intra predictor 444. The current coding blocks 2006A-E are rectangular coding blocks currently being processed by the intra predictor 444. (the size of the encoding blocks 2006A-E is arbitrarily chosen for illustrative purposes-the current techniques may be applied to any encoding block in accordance with the present methods and systems.)

In FIGS. 20A-20C, the rows and columns of pixels to the left from directly above the encoding blocks 2006A-C form exemplary prediction regions 2008A-C. The prediction region is an arrangement of pixels near the current coding block that have been processed by the intra predictor 444 and thus have a prediction value associated therewith. The relative spatial arrangement of the pixels of the prediction regions 2008A-C form an "L" shaped prediction region that borders pixels of the encoding blocks 2006A-C along the upper and left sides of the encoding blocks (i.e., both sides of the encoding blocks 2006A-C that border the processed regions 2002A-C).

In FIGS. 20D-20E, the pixels from the row directly above the encoding blocks 2006A-C form exemplary prediction regions 2008D-E. The relative spatial arrangement of the pixels of the prediction regions 2008D-E form "stripe" shaped prediction regions that border pixels of the encoding blocks 2006A-C that run along the upper side and to the left side of the encoding blocks.

According to various aspects of the present methods and systems, prediction values for pixels within prediction regions 2008A-E may be mapped to diagonally consecutive pixels of coding blocks 2006A-E, e.g., along a diagonal vector having a slope of-1.

According to other aspects of the present method and system, as shown in fig. 20 a-20 c, the predicted values of pixels in an L-shaped prediction region may be combined with the predicted values of pixels in a stripe-shaped prediction region for a single coding block. For example, the predicted value PV may be generated according to equation 1:

PV=a*P+(1-a)P

Where PL is a pixel in the L-shaped prediction region, PB is a pixel in the stripe-shaped prediction region, and a is a coefficient that controls the prediction efficiency.

Fig. 21A to 21B show a pixel region 2100 corresponding to a portion of a video frame (not shown). Pixel area 2100 is shown partially encoded, having a processed area 2102, an unprocessed area 2104 (indicated by single hatching), and a current encoding block 2106. The processed area 2102 represents pixels that have been indexed into the encoded blocks by the block indexer 408 and that have been processed by the intra predictor 444. The unprocessed region 2104 represents pixels that are not processed by the intra predictor 444. The current coding block 2106 is an 8 × 16 rectangular coding block that is currently processed by the intra predictor 444 according to the directional prediction technique described above with respect to fig. 20.

The prediction region 2108 includes pixels from the row directly above and the column to the left of the encoding block 2106. In fig. 21A, the predicted value of each pixel of the prediction region 2108 is indicated by an alphanumeric indicator corresponding to the relative rows (in letters) and columns (in numbers) of pixels within the prediction region. The diagonal vector extends from each pixel of the prediction region 2108 to one or more pixels of the coding block 2106, corresponding to a mapping of the prediction values of the prediction region to the pixels of the coding block. In fig. 21B, the predicted value of the mapping for each pixel of the encoding block 2106 is indicated by an alphanumeric indicator corresponding to the source of the predicted value of the pixel.

Directional predictor selection routine

Fig. 22 illustrates an exemplary orientation predictor selection routine 2200 that may be implemented by the intra predictor 444, in accordance with various embodiments. For example, if the rectangular coding block predictor selection routine 1700, as described above, fails to find a suitable predictor for a coding block, the intra predictor 444 may use the directional predictor selection routine 2200 instead.

The directional predictor selection routine 2200 may obtain an encoded block at execution block 2202.

at a start loop block 2204, the directional predictor selection routine 2200 may process each pixel of the obtained encoded block in turn. For example, the directional predictor selection routine 2200 may process pixels of an encoded block from left to right and top to bottom.

The directional predictor selection routine 2200 may select a prediction region to be used to select a predictor for the current pixel at execution block 2206. For example, the directional predictor selection routine 2200 may select an L-shaped prediction region, a bar-shaped prediction region, or the like. The directional predictor selection routine 2200 may also choose to combine multiple prediction regions (assuming, for purposes of this example, that there are only two possible prediction regions per coding block-the L-shaped region and the slab region, as described above). The directional predictor selection routine 2200 may select the same prediction region for each pixel of the current coding block or may alternate between prediction regions.

At decision block 2208, if the directional predictor selection routine 2200 selects a combined prediction region, the directional predictor selection routine 2200 may proceed to execution block 2214, described below; otherwise, the directional predictor selection routine 2200 may proceed to execute block 2210.

The directional predictor selection routine 2200 may select a source pixel from the selected prediction region for the current pixel of the encoded block at execution block 2210. For example, the directional predictor selection routine 2200 may select a source pixel based on the diagonal vectors described above with respect to fig. 20 a-20 e.

The directional predictor selection routine 2200 may map the predictor from the source pixel to the current pixel of the encoded block at execution block 2212. The directional prediction value selection routine 2200 may then proceed to end loop block 2224.

Returning to decision block 2208, as described above, if the combined prediction region was selected by the directional predictor selection routine 2200, at execution block 2214, the directional predictor selection routine 2200 may select a prediction control coefficient.

The directional predictor selection routine 2200 may select a source pixel for the current pixel of the encoded block from a first prediction region (e.g., an L-shaped prediction region) at execution block 2216.

The directional predictor selection routine 2200 may select a source pixel for the current pixel of the encoded block from a second prediction region (e.g., a slab prediction region) at execution block 2218.

The directional predictor selection routine 2200 may use the predictor of the selected source pixel and the selected prediction control coefficient to calculate a combined predictor. For example, the directional predictor selection routine 2200 may calculate the combined predictor according to equation 1 above.

The directional predictor selection routine 2200 may map the combined predictor to the current pixel of the encoded block at execution block 2222.

At an end loop block 2224, the directional predictor selection routine 2200 may loop back to the start loop block 2204 to process the next pixel (if any) of the encoded block.

The directional predictor selection routine 2200 may terminate at a return block 2299.

Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific embodiments shown and described without departing from the scope of the present invention. This application is intended to cover any adaptations or variations of the embodiments discussed herein.

58页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种帧间预测的方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类