Audio encoder

文档序号：1393447 发布日期：2020-02-28 浏览：15次中文

阅读说明：本技术 音频编码器 (Audio encoder ) 是由萨沙·迪施克里斯蒂安·赫尔姆里希马库斯·穆赖特鲁斯马库斯·施内尔阿瑟·特里特哈特于 2014-01-28 设计创作，主要内容包括：本公开涉及音频编码器。该基于输入音频信息提供编码音频信息的音频编码器,包括：带宽扩展信息提供器,配置为使用可变时间分辨率提供带宽扩展信息,及侦测器,配置为侦测摩擦音或破擦音的起始段。音频编码器配置为调整带宽扩展信息提供器所使用的时间分辨率,以使得至少针对侦测到摩擦音或破擦音的起始段的时间的之前的预定时间周期及针对侦测到摩擦音或破擦音的起始段的时间的之后的预定时间周期,以提高的时间分辨率提供带宽扩展信息。可选地或者此外,响应于侦测到摩擦音或破擦音的终止段,以提高的时间分辨率提供带宽扩展信息。音频编码器及方法使用对应的概念。(The present disclosure relates to audio encoders. The audio encoder for providing encoded audio information based on input audio information, comprising: a bandwidth extension information provider configured to provide bandwidth extension information using a variable time resolution, and a detector configured to detect an onset of a fricative or affricate. The audio encoder is configured to adjust a temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided with an increased temporal resolution at least for a predetermined time period before a time when the onset of the fricative or affricate is detected and for a predetermined time period after the time when the onset of the fricative or affricate is detected. Alternatively or additionally, bandwidth extension information is provided at an increased temporal resolution in response to detecting a fricative or affricate end segment. Audio encoders and methods use corresponding concepts.)

1. An audio encoder (100) for providing an encoded audio information (112) based on an input audio information (112), the audio encoder comprising:

a bandwidth extension information provider (130) configured to provide bandwidth extension information (132) using a variable time resolution;

a detector (120) configured to detect an onset of a fricative or affricate;

wherein the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that at least the time (t) for detecting the onset of a fricative or affricate sound_f) And a predetermined time period (630c) after the time for detecting the onset of the fricative or affricate, providing bandwidth extension information with increased time resolution.

2. The audio encoder (100) of claim 1, wherein the audio encoder is configured to switch from a first temporal resolution for the providing the bandwidth extension information to a second temporal resolution for the providing the bandwidth extension information in response to the detection of the onset of a fricative or affricate,

wherein the second temporal resolution is higher than the first temporal resolution.

3. The audio encoder (100) of claim 1, wherein the bandwidth extension information provider is configured to provide the bandwidth extension information such that the bandwidth extension information is associated with temporally regular time intervals (620a, 620b, 620c, 620d, 620e, 620 f; 720a-720f) of equal time length,

wherein the bandwidth extension information provider is configured to provide a single set of bandwidth extension information for a time interval (620a, 620b, 620c, 620d, 620 f; 720a, 720b, 720c, 720f) of a given length of time if a first temporal resolution is used, and

wherein the bandwidth extension information provider is configured to provide a plurality of sets of bandwidth extension information associated with sub-time intervals (630a, 630b, 630c, 630d) for a time interval (620 e; 720d, 720e) of the given length of time, if a second time resolution is used.

4. The audio encoder (100) of claim 3, wherein the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that at least one sub-time interval (630 a; 730d) associated with one set of bandwidth extension information immediately precedes another sub-time interval (630 b; 730e) associated with another set of bandwidth extension information and during which (630 b; 730e) a start segment of a fricative or affricate is detected,

such that the increased temporal resolution is used in at least one sub-time interval (630 a; 730d) preceding the sub-time interval (630 b; 730e) in which the onset of a fricative or affricate is detected.

5. The audio encoder (100) of claim 3, wherein the audio encoder is configured to subdivide a given time interval (620 e; 720d, 720e) of a given length of time into four sub-time intervals (630a-630 d; 730a-730h) of equal length, if the bandwidth extension information is provided using an increased temporal resolution for the given time interval (620 e; 720d, 720e) of the given length of time,

such that four sets of bandwidth extension information are provided for the given time interval having the given length of time.

6. The audio encoder (100) of claim 1,

wherein the audio encoder is configured to selectively provide bandwidth extension information using an increased temporal resolution for a first time interval (720d) of a given length of time preceding a second time interval (720e) of a given length of time,

if a fricative or affricate onset is detected within the second time interval (720e) and if a time of the fricative or affricate onset is detected, a time distance between a time of the fricative or affricate onset and a boundary between the first time interval (720d) and the second time interval (720e) is less than a predetermined time distance.

7. The audio encoder (100) of claim 6,

wherein the audio encoder is configured to perform time-look-ahead such that, in response to detecting a start segment of a fricative or affricate within the second time interval (720e), bandwidth extension information is provided with an increased temporal resolution for a first time interval (720d) of a given length of time preceding the second time interval (720e) of the given length of time.

8. The audio encoder (100) of claim 1,

wherein the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that at least the time (t) for detecting the onset of a fricative or affricate sound_f) Of the predetermined time period (630 a; 730d) and a predetermined time period (630 c; 730f) the bandwidth extension information is provided with the same improved temporal resolution.

9. The audio encoder (100) of claim 1,

wherein the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that a set of bandwidth extension information is provided with the same increased temporal resolution for at least a first sub-time interval (630 a; 730d), a second sub-time interval (630 b; 730e) and a third sub-time interval (630 c; 730f),

wherein the first sub-time interval immediately precedes the second sub-time interval;

wherein, the initial segment of fricative sound or affricate sound is detected in the second sub-time interval; and

wherein the third sub-interval immediately follows the second sub-interval.

10. The audio encoder (100) of claim 1,

wherein the detector is configured to detect a fricative or affricate termination segment; and

wherein the audio encoder is configured to adjust a temporal resolution used by the bandwidth extension information provider such that bandwidth extension information is provided with an increased temporal resolution at least for a predetermined time period before a time at which an end segment of a fricative or affricate is detected and for a predetermined time period after the time at which the end segment of the fricative or affricate is detected.

Technical Field

Embodiments according to the present invention relate to an audio encoder for providing encoded audio information based on input audio information.

Other embodiments according to the present invention are directed to an audio decoder for providing decoded audio information based on encoded audio information.

Other embodiments according to the present invention are directed to a system comprising an audio encoder and an audio decoder.

Other embodiments according to the present invention are directed to a method of providing encoded audio information based on input audio information.

Other embodiments according to the present invention are directed to a method of providing decoded audio information based on encoded audio information.

Further embodiments according to the invention relate to a computer program for performing one of the methods.

Other embodiments according to the invention are directed to modeling the beginning or ending segments of fricatives or affricates in audio bandwidth extension for speech.

Background

In recent years, the demand for digital storage and transmission of audio signals, particularly voice signals, has increased. In some cases, like for example mobile communication applications, it is required to obtain a relatively low bit rate.

However, in order to obtain a good balance between bit rate and audio quality (or speech quality), there are methods to encode the low frequency part of an audio signal (e.g. the frequency part up to approximately 6 kHz) with relatively high precision and to reconstruct the high frequency part of the audio content (e.g. the frequency part above approximately 6kHz or 7 kHz) depending on the bandwidth extension. For example, bandwidth extension may be based on reconstructing the high frequency part of the audio content using relatively few parameters, wherein the parameters may describe the spectral envelope, e.g. in a coarse manner.

A well-known implementation of bandwidth extension is bandwidth replication (SBR), which has been standardized in MPEG (moving picture experts group).

For example, some details regarding bandwidth replication are described in International Standard ISO/IEC 14496-3:200X (E) subsection 4 in chapters 4.6.18 and 4.6.19.

In addition, reference is also made to patent application No. US 2011/0099018 a1, which describes an apparatus and method for calculating bandwidth extension data using spectral tilt controlled framing. Said patent application describes an apparatus for calculating bandwidth extension data of an audio signal in a bandwidth extension system, wherein a first bandwidth is encoded with a first number of bits and a second bandwidth, different from the first bandwidth, is encoded with a second number of bits, the second number of bits being smaller than the first number of bits. The device has a controllable bandwidth extension parameter calculator that calculates bandwidth extension parameters of the second bandwidth in a frame-by-frame manner for a first sequence of frames of the audio signal. Each frame has a controllable start time instant. The apparatus additionally comprises a spectral tilt detector that detects a spectral tilt in a temporal portion of the audio signal and signals a start time instant of an individual frame of the audio signal depending on the spectral tilt.

However, it has been found that in many known methods of bandwidth extension, the auditory effect obtained in the presence of fricatives or affricates is substantially degraded. For example, known bandwidth extension techniques may cause pre-echo and post-echo. Furthermore, fricatives or affricates may sound too sharp when using known bandwidth extension techniques.

In view of the above, there is a need to create a bandwidth extension concept that allows for improved audio quality.

Disclosure of Invention

Embodiments in accordance with the present invention create an audio encoder that provides encoded audio information based on input audio information. The audio encoder comprises a bandwidth extension information provider configured to provide bandwidth extension information using a variable time resolution. The audio encoder also includes a detector configured to detect an onset of a fricative or affricate. The audio encoder is configured to adjust a temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided with an increased temporal resolution at least for a predetermined time period before a time when the onset of the fricative or affricate is detected and for a predetermined time period after the time when the onset of the fricative or affricate is detected.

This embodiment according to the invention is based on the finding that good hearing quality can be achieved if the bandwidth extension information is provided with a high temporal resolution for the entire environment of the time at which the onset of the fricative or affricate is detected. Thus, the entire onset of a fricative or affricate is encoded with a high temporal resolution (at least with respect to bandwidth extension information), which typically comprises a certain temporal extension before the time at which the onset of the fricative or affricate is detected and a certain period (temporal extension) after the time at which the onset of the fricative or affricate is actually detected, thereby helping to avoid pre-echo and also helping to avoid unnatural auditory sensations. In general, the onset of a fricative or affricate cannot be detected very accurately, since the detection of the onset of a fricative or affricate is often based on the detection of a critical crossing, which apparently does not occur right at the beginning of the onset of a fricative or affricate. Thus, the onset of a fricative or affricate is (actually) detected at a time just after the onset (or onset) of the fricative or affricate. Thus, by ensuring that the bandwidth extension information is provided with an increased time resolution (compared to the "normal" time resolution) at least for a predetermined period of time preceding the time at which the onset of the fricative or affricate is (actually) detected: details just beginning with the beginning of a fricative or affricate can also be reproduced with good resolution, wherein it has been found that such details even just beginning with the beginning of a fricative or affricate are important for a good auditory sensation. Thus, by providing bandwidth extension information with increased time resolution at least for a predetermined time period prior to the time at which the onset of the fricative or affricate is detected, not only is it helpful to avoid pre-echoes, but it also enables the details of the onset of the fricative or affricate to be reproduced. Similarly, by ensuring a predetermined time period after the time for detecting the onset of a fricative or affricate, bandwidth extension information is provided with increased temporal resolution, enabling the reproduction of details of the onset of the fricative or affricate, such details being important for the hearing perception.

The concept described herein thus enables the reproduction of the entire onset of a fricative or affricate with a high temporal resolution, which helps to avoid a deterioration of the hearing sensation, which is caused for example by a temporal resolution that is too coarse (of the bandwidth extension information) just at the beginning of the onset of the fricative or affricate or at the transition from the onset of the fricative or affricate to the stationary signal part.

In a preferred embodiment, the audio encoder is configured to switch from a first temporal resolution for providing the bandwidth extension information to a second temporal resolution for providing the bandwidth extension information in response to detecting an onset of a fricative or affricate, wherein the second temporal resolution is higher than the first temporal resolution. Thus, a switch between two different time resolutions for providing bandwidth extension information is performed, wherein the switch is controlled by detecting an onset of a fricative or affricate. Thus, a simple control scheme is created, which can be easily implemented in an audio encoder or audio decoder.

In a preferred embodiment, the bandwidth extension information provider is configured to provide the bandwidth extension information such that the bandwidth extension information is associated with temporally regular time intervals (which may form a basic but sub-divisible time grid for providing the bandwidth extension information) of equal time length. The bandwidth extension information provider is configured to provide a single set of bandwidth extension information for a time interval having a given length in time when a first temporal resolution (e.g., a relatively lower temporal resolution) is used. Furthermore, the bandwidth extension information provider may be configured to provide a plurality of sets of bandwidth extension information associated with sub-time intervals for a time interval having a given length of time when using the second time resolution (e.g. a relatively higher time resolution).

An audio encoder may be easily implemented by using temporally regular time intervals (e.g. frames) of equal time length as a (basic) time grid for providing bandwidth extension information. For example, the bandwidth extension information provider only needs to switch between two discrete time resolutions, which can be implemented without excessive effort. For example, the bandwidth extension information provider may only need to be implemented to provide a single set of bandwidth extension information based on a time interval of a given length of time, and to provide multiple sets of bandwidth extension information based on a predetermined (and fixed) number of sub-intervals of equal length of the time interval of the given length of time. Thus, the following may be sufficient, for example: the bandwidth extension information provider is configured to provide a single set of bandwidth extension information based on a time interval having a given length of time, or four sets of bandwidth extension information based on four sub-time intervals, each of the sub-time intervals having a length equal to one quarter of the given length of time. Furthermore, by using such concepts, the signaling workload that may be required for signaling during time intervals in which bandwidth extension information is provided may be kept small, since it is only necessary to select between "coarse resolution" (e.g., a single set of bandwidth extension information for a time interval having a given length in time) and "fine resolution" (e.g., n sets of bandwidth extension information associated with n sub-time intervals having equal lengths). Thus, a specific efficient concept for providing bandwidth extension information is provided.

In a preferred embodiment, the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that at least one sub-time interval associated with one set of bandwidth extension information immediately precedes another sub-time interval associated with another set of bandwidth extension information and during which the onset of the fricative or affricate is detected, such that the increased temporal resolution is used in the at least one sub-time interval preceding the sub-time interval in which the onset of the fricative or affricate is detected. It is thus possible to provide the bandwidth extension information at a high temporal resolution even just before the beginning of the start segment of the fricative or affricate, i.e. even before the start segment of the fricative or affricate can actually be detected.

In a preferred embodiment, the audio encoder is configured to subdivide a given time interval having a given length of time into four sub-time intervals of equal length if the bandwidth extension information is provided using an increased temporal resolution for the given time interval having the given length of time, such that four sets of bandwidth extension information (e.g. four sets of bandwidth extension parameters, each set being associated with one of the sub-time intervals) are provided for the given time interval having the given length of time. Thus, a high temporal resolution of the bandwidth extension information may be achieved, since the four sets of bandwidth extension information may independently describe the envelope of the high frequency signal portion of the audio content, e.g. for four sub-intervals. Thus, the difference in spectral envelopes of the high frequency signal portions of the four sub-time intervals may be considered, as each of the sets of bandwidth extension information may represent a frequency envelope (or spectral envelope) of the high frequency portion of one of the sub-time intervals.

In a preferred embodiment, the audio encoder is configured to selectively provide the bandwidth extension information with an increased temporal resolution for a first time interval of a given length of time preceding a second time interval of a given length of time if a fricative or affricate onset is detected within the second time interval and if a temporal distance between a time at which the fricative or affricate onset is detected and a boundary between the first time interval and the second time interval is less than a predetermined temporal distance. Thus, even in case the time at which the onset of the fricative or affricate is detected is within a subsequent second time interval (e.g. a subsequent second frame), the bandwidth extension information of the first time interval (e.g. the first frame) is provided with an increased time resolution (compared to the "normal" time resolution) if it is assumed that the onset of the fricative or affricate is located within the first time interval just at the beginning (typically located before the time at which the onset of the fricative or affricate is actually detected). Thus, the entire onset of a fricative or affricate, including the amount of time just before the onset of the fricative or affricate and possibly even the onset of the fricative or affricate, is evaluated for which a high temporal resolution is used in providing the bandwidth extension information, resulting in good speech reproduction. Rather than just avoiding pre-echoes, the onset of fricatives or affricates can be accurately reproduced without excessive sharpness or other substantial artifacts.

In a preferred embodiment, the audio encoder is configured to run-time look-ahead such that in response to detecting an onset of a fricative or affricate within the second time interval, the bandwidth extension information is provided with an increased temporal resolution for a first time interval of a given length of time preceding the second time interval of the given length of time. Thus, it is possible to provide bandwidth extension information with an increased temporal resolution for the entire onset of a fricative or affricate (and possibly even a short time period before the onset of a fricative or affricate), resulting in an improved audio quality.

In a preferred embodiment, the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided with the same increased temporal resolution at least for a predetermined time period before the time when the onset of the fricative or affricate is detected and for a predetermined time period after the time when the onset of the fricative or affricate is detected. By using equal time resolutions, the provision of bandwidth extension information is simplified compared to the case where different time resolutions are used before and after the time at which the onset of the fricative or affricate is detected. Furthermore, by using the same increased time resolution for a predetermined time period before the time when the onset of the fricative or affricate is detected and for a predetermined time period after the time when the onset of the fricative or affricate is detected, the signaling workload is reduced.

In a preferred embodiment, the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that the set of bandwidth extension information is provided with the same increased temporal resolution for at least a first sub-interval, a second sub-interval and a third sub-interval, wherein the first sub-interval immediately precedes the second sub-interval, wherein a start of a fricative or affricate is detected within the second sub-interval, and wherein the third sub-interval immediately follows the second sub-interval. Thus, when providing the set of bandwidth extension information, the first and third sub-intervals of the second sub-interval "embedded" with the start segment during which the fricative or affricate was detected are processed with the same temporal resolution. Thus, when providing bandwidth extension information, a substantial part of the onset of a fricative or affricate, or even the entire onset of a fricative or affricate, is treated with a high temporal resolution. Furthermore, by using the same (increased, or "high") temporal resolution for the first, second and third sub-time intervals, encoding and decoding becomes simple and the signaling management burden (for signaling temporal resolution) becomes small.

In a preferred embodiment, the detector is configured to detect the end segment of a fricative or affricate sound. In this case, the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided with an increased temporal resolution at least for a predetermined time period before the time when the end segment of the fricative or affricate is detected and for a predetermined time period after the time when the end segment of the fricative or affricate is detected. This embodiment according to the invention is based on the finding that also for the terminating segment of a fricative or affricate, the bandwidth extension should be performed with a high temporal resolution. It has been found that human hearing is actually sensitive to the end segment of fricatives or affricates, and it is therefore worthwhile to expend the bitrate management burden to encode the end segment of fricatives or affricates with high temporal resolution (with respect to bandwidth extension information). Furthermore, it has been found that providing bandwidth extension information at a low temporal resolution during the end segment of a fricative or affricate sound often results in an unduly sharp auditory sensation during the end segment of the fricative or affricate sound, which sensation is perceived as an artifact.

Furthermore, it should be noted that with respect to adjusting the temporal resolution used by the bandwidth extension information provider in response to the onset of a fricative or affricate, any of the above-mentioned concepts may also be advantageously applied in response to detecting the end of a fricative or affricate. In other words, the concepts described above may be applied in a similar manner, wherein "the terminating segment of the fricative or affricate" replaces "the initiating segment of the fricative or affricate".

In a preferred embodiment, the detector is configured to evaluate the zero crossing rate, and/or the energy ratio and/or the spectral tilt, in order to detect the onset of fricatives or affricates. It has been found that evaluation of one or more of the above mentioned quantities (zero-crossing rate, energy ratio, spectral tilt) enables reasonably accurate detection of the onset of fricatives or affricates. For example, one or more of the above-mentioned values, or a value derived from a combination of the above-mentioned quantities, may be compared to a threshold value in order to detect the presence of a fricative or affricate.

In a preferred embodiment, the encoder is configured to selectively adjust the temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided at an increased temporal resolution in response to detecting the onset of a fricative or affricate for only the speech signal portions and not the music signal portions. This concept is based on the finding that fricatives or affricates are more important to the perception of speech than the perception of music signal parts. Thus, for music signal parts, the bitrate management burden that can be incurred by providing bandwidth extension information with increased temporal resolution can be avoided, which helps to reduce the overall bitrate, or helps to focus on the coding of perceptually more important features for music signal parts.

In a preferred embodiment, the audio encoder is configured to selectively provide the bandwidth extension information with an increased temporal resolution for a plurality of subsequent time intervals that completely cover the beginning segment of the detected fricative or affricate. Therefore, even when bandwidth extension is used, the initial segment of a fricative or affricate is encoded with high precision so that the auditory sensation is not substantially deteriorated using bandwidth extension.

According to another embodiment of the present invention an audio encoder for providing encoded audio information on the basis of input audio information is created. The audio encoder comprises a bandwidth extension information provider configured to provide bandwidth extension information using a variable time resolution. The audio encoder also includes a detector configured to detect an end segment of a fricative or affricate. The audio encoder is configured to adjust a temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided at an increased temporal resolution in response to detecting the end segment of the fricative or affricate.

This embodiment according to the invention is based on the finding that the end segment of a fricative or affricate is also important for the perception of the audio content and should therefore be encoded with a high temporal resolution. In particular, this embodiment according to the present invention is based on the finding that if the end segment of a fricative or affricate is encoded with insufficient temporal resolution of the bandwidth extension information, the end segment of the fricative or affricate is generally considered to be "too sharp". Thus, by increasing the temporal resolution used by the bandwidth extension information provider, the audio quality (e.g., of the speech signal) may be substantially improved.

In a preferred embodiment, the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided with an increased temporal resolution at least for a predetermined period of time before the time at which the end segment of the fricative or affricate is detected and for a predetermined period of time after the time at which the end segment of the fricative or affricate is detected. Thus, it is possible to encode the entire end segment of a fricative or affricate with an increased time resolution, although the detector is usually only able to detect the center of the end segment of a fricative or affricate, etc.

According to another embodiment of the present invention an audio decoder is created that provides decoded audio information based on encoded audio information. The audio decoder is configured to perform bandwidth extension based on bandwidth extension information provided by the audio encoder such that the bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before a time when the onset of the fricative or affricate is detected and for a predetermined time period after the time when the onset of the fricative or affricate is detected. Thus, the audio decoder is able to reproduce a substantial part of the onset of a fricative or affricate, or even the entire onset of a fricative or affricate, with a high temporal resolution. Thus, the bandwidth extension performed by the audio decoder may be well adapted to the presence of fricatives or affricates, so that changes in the spectral envelope of the high frequency part of the audio content occurring during the onset of the fricatives or affricates may be reproduced with good perceptual quality. Thus, a good auditory sensation is achieved.

In a preferred embodiment, the audio decoder may comprise a detector configured to detect an onset of a fricative or affricate based on the decoded audio information, said onset of the fricative or affricate representing a low frequency part of the audio content, and to decide itself about the adjustment of the temporal resolution for the bandwidth extension. Any of the criteria discussed herein with respect to the audio encoder for detecting the onset of a fricative or affricate may also be applied to the audio decoder (assuming the desired information is available alongside the audio decoder).

Alternatively, however, the audio decoder may be configured to adjust the temporal resolution for the bandwidth extension based on the side information of the encoded audio information.

This embodiment according to the invention is based on the idea that a good audio quality can be achieved by performing the bandwidth extension with an increased time resolution during the end segment of the fricative or affricate. Furthermore, embodiments are based on the idea that the end segment of a fricative or affricate is typically extended by a certain time period, wherein the time at which the end segment of a fricative or affricate is detected is typically located within said certain time period.

A further embodiment according to the invention creates a system comprising an audio encoder as described above and an audio decoder, wherein the audio decoder is configured to receive encoded audio information provided by the audio encoder and to provide decoded audio information based on the encoded audio information. The audio decoder is configured to perform bandwidth extension based on the bandwidth extension information provided by the audio encoder such that bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before a time at which an onset of a fricative or affricate is detected and for a predetermined time period after the time at which the onset of the fricative or affricate is detected, and/or such that bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before a time at which an end segment of a fricative or affricate is detected and for a predetermined time period after the time at which an end segment of a fricative or affricate is detected.

The system allows encoding and decoding of audio content, wherein a relatively low bit rate is achieved by using bandwidth extension, and wherein a good reproduction of fricatives or affricates is ensured by using an increased temporal resolution in the context of an onset segment of fricatives or affricates and/or in the context of an end segment of fricatives or affricates.

According to another embodiment of the invention a method of providing encoded audio information on the basis of input audio information is created. The method includes providing bandwidth extension information using variable time resolution and detecting an onset of a fricative or affricate. The time resolution for providing the bandwidth extension information is adjusted such that the bandwidth extension information is provided with an increased time resolution at least for a predetermined time period before the time at which the fricative or affricate onset is detected and for a predetermined time period after the time at which the fricative or affricate onset is detected. This approach is based on the same considerations as the audio encoder described above.

According to another embodiment of the invention a method of providing encoded audio information on the basis of input audio information is created. The method includes providing bandwidth extension information using variable time resolution and detecting a fricative or affricate end segment. The temporal resolution used to provide the bandwidth extension information is adjusted such that the bandwidth extension information is provided at an increased temporal resolution in response to detecting the end segment of a fricative or affricate. This approach is based on the same considerations as the audio encoder described above.

According to another embodiment of the invention a method of providing decoded audio information on the basis of encoded audio information is created. The method includes performing bandwidth extension based on bandwidth extension information provided by an audio encoder such that bandwidth extension is performed with increased temporal resolution at least for a predetermined time period before a time at which an onset of a fricative or affricate is detected and for a predetermined time period after the time at which the onset of the fricative or affricate is detected. This approach is based on the same considerations as the audio decoder described above.

According to another embodiment of the invention a method of providing decoded audio information on the basis of encoded audio information is created. The method includes performing bandwidth extension based on bandwidth extension information provided by an audio encoder such that bandwidth extension is performed with increased temporal resolution at least for a predetermined time period before a time at which an end segment of a fricative or affricate is detected and for a predetermined time period after the time at which the end segment of the fricative or affricate is detected. This approach is based on the same considerations as the audio decoder described above.

According to a further embodiment of the invention a computer program for performing one of the above described methods is created.

According to another embodiment of the present invention, an encoded audio signal is created comprising an encoded representation of a low frequency part of audio content and a plurality of sets of bandwidth extension parameters. The bandwidth extension parameter is provided with an increased time resolution at least for a predetermined time period before the time of the onset of the fricative or affricate present in the audio content and for a predetermined time period after the time of the onset of the fricative or affricate present in the audio content.

According to another embodiment of the present invention, an encoded audio signal is created comprising an encoded representation of a low frequency part of audio content and a plurality of sets of bandwidth extension parameters. The bandwidth extension parameter is provided with an increased temporal resolution at least for the part of the audio content where the end segment of the fricative or affricate is present.

The encoded audio signals are based on the same considerations as the audio encoder and the audio decoder described above.

Drawings

Embodiments according to the invention will be described below with reference to the accompanying drawings:

FIG. 1 shows a block schematic diagram of an audio encoder according to an embodiment of the invention;

FIG. 2 shows a spectrogram of an original speech signal in a known bandwidth extension (BWE) frame and detected fricative or affricate boundaries;

FIG. 3 shows a spectral diagram of an original speech signal with a bandwidth extension (BWE) frame according to the present invention;

FIG. 4 shows a spectral plot of encoded speech in a known bandwidth extension (BWE) frame;

FIG. 5 shows a spectral plot of encoded speech in a bandwidth extended (BWE) frame in accordance with the present invention;

fig. 6 shows a schematic representation of time intervals and sub-time intervals for which a set of bandwidth extension information is provided according to an embodiment of the invention;

fig. 7 shows a schematic representation of time intervals and sub-time intervals for which a set of bandwidth extension information is provided according to an embodiment of the invention;

FIG. 8 shows a block schematic diagram of an audio encoder according to another embodiment of the invention;

FIG. 9 shows a block schematic diagram of an audio decoder according to another embodiment of the invention;

FIG. 10 shows a block schematic diagram of an audio decoder according to another embodiment of the invention;

FIG. 11 shows a block schematic diagram of a system for audio encoding and audio decoding according to an embodiment of the invention;

FIG. 12 shows a flow diagram of a method of providing encoded audio information based on input audio information, according to an embodiment of the invention; and

fig. 13 illustrates a flowchart of a method of providing decoded audio information based on input audio information according to an embodiment of the present invention.

Detailed Description

1. Audio encoder according to FIG. 1

Fig. 1 shows a block schematic diagram of an audio encoder according to an embodiment of the invention.

The audio encoder 100 is configured to receive input audio information 110 and to provide encoded audio information 112 based on the input audio information 110.

The audio encoder 100 comprises a detector 120, which detector 120 may for example receive the input audio information 110. The detector 120 is configured to detect an onset of a fricative or affricate, for example, based on the input audio information 110. The detector 120 provides temporal resolution adjustment information 122.

The audio encoder 100 also comprises a bandwidth extension information provider 130, said bandwidth extension information provider 130 being configured to provide bandwidth extension information 132 using a variable time resolution. For example, the bandwidth extension information provider 130 may be configured to receive input audio information (and possibly additional pre-processed audio information). In addition, the bandwidth extension information provider 130 may also be configured to receive the temporal resolution adjustment information 122 from the detector 120.

The audio encoder 100 may also comprise a low frequency encoding means 140, which low frequency encoding means 140 may, for example, encode a low frequency portion of the audio content represented by the input audio information 110, thereby providing an encoded representation 142 of the low frequency portion of the audio content represented by the input audio information 110. Thus, the encoded audio information 112 may include the bandwidth extension information 132 and an encoded representation 142 of the low frequency portion of the audio content. However, details regarding the low frequency encoding apparatus are not important parts of the present invention.

The functionality of the audio encoder 100 will be described in more detail below.

The low frequency encoding device 140 may encode a low frequency portion of the audio content represented by the input audio information 110. For example, portions of the audio content having frequencies below approximately 6kHz or below approximately 7kHz (or below any other predetermined frequency limit) may be encoded using the low frequency encoding device 140. The low frequency encoding device 140 may, for example, use any of the well-known audio encoding techniques, like transform domain encoding or linear prediction domain encoding. In other words, the low frequency coder 140 may, for example, use an audio coding concept, which may be based on the well-known "advanced audio coding" (AAC) or may be based on the well-known "linear predictive coding". For example, the low frequency encoding device 140 may include (or use) a modified "advanced audio encoding" as described in the international standard ISO/IEC 23003-3. Alternatively or additionally, the low frequency encoding means 140 may comprise (or use) linear prediction encoding, for example as described in international standard ISO/IEC 23003-3. However, the low frequency encoding device 140 may also include a switch between (modified or unmodified) "advanced audio coding" and linear prediction domain audio coding. It should be noted, however, that in principle any concept known in the art of encoding of audio signals may be used for the low frequency encoding means 140 in order to provide an encoded representation 142 of the low frequency part of the audio content represented by the input audio information.

However, the bandwidth extension information provider 130 may provide bandwidth extension information (e.g., in the form of bandwidth extension parameters) that enables reconstruction of a high frequency portion of the audio content represented by the input audio information 110 that is not represented by the encoded representation 142 provided by the low frequency encoding device 140. For example, the bandwidth extension information provider 130 may be configured to provide some or all of the bandwidth replication parameters described in the international standard ISO/IEC14496-3 (or any other standard involving ISO/IEC 14496-3).

For example, the bandwidth extension information provider may be configured to provide some or all of the parameters described in the "SBR tool" and/or "low latency SBR" sections of the international standard ISO/IEC 14496-3. For example, the bandwidth extension information provider 130 may be configured to provide some or all of the following syntax elements: "sbr _ extension _ data ()", "" sbr _ header () "," "sbr _ data ()", "" sbr _ single _ channel _ element () "," "sbr _ channel _ pair _ element ()", or other bitstream components referenced therein, such as defined in the international standard ISO/IEC 14496-3. In other words, the bandwidth extension information provider 130 may provide bandwidth copy parameters that may, for example, roughly describe the spectral envelope of the high frequency portion of the audio content represented by the input audio information 110. However, the bandwidth extension information provider 130 may also comprise parameters describing the noise in the high frequency part of the audio content represented by the input audio information 110 and/or may comprise parameters describing one or more sinusoidal signals comprised in the high frequency part of the audio content represented by the input audio information 110. In addition, the bandwidth extension information provider 130 may, for example, provide a number of configuration parameters, as also described in the international standard ISO/IEC14496-3 with respect to the bandwidth replication tool. For example, the bandwidth extension information provider 130 may provide one or more parameters representing a temporal resolution for providing a set of bandwidth extension information, e.g., a temporal resolution whereby an updated set of parameters representing a spectral envelope of a high frequency portion of the audio content represented by the input audio information may be provided. For example, the bandwidth extension provider 130 may provide a control parameter indicating whether one or four sets of spectral envelope parameters are provided per audio frame. For example, the control parameters provided by the bandwidth extension information provider 130 may be similar to or even equal to the parameters provided in the case of "fiffix" in the syntax element "sbr _ grid ()", as described in the international standard ISO/IEC 14496-3.

However, the bandwidth extension provider 130 may alternatively be configured to provide control information similar to or even equal to the control information included in the bitstream component "sbr _ ld _ grid ()" described in, for example, chapter 4.6.19.3.2 of the international standard ISO/IEC 14496-3.

For example, a 2-bit value may be used to encode how many sets of envelope shape parameters are provided by the bandwidth extension information provider 130 per audio frame (compare, bitstream components "bs _ num _ env" as described in chapter 4.6.19.3.2 of ISO/IEC 14496-3).

Preferably, the signaling may be performed as indicated for the "FIXFIX" case, which is described in chapter 4.6.19 "Low delay SBR" of ISO/IEC 14496-3.

In conclusion, the bandwidth extension information provider 130 provides the bandwidth extension information 132, wherein the temporal resolution (e.g. the time period between update parameters representing the spectral envelope of the high frequency part of the audio content represented by the input audio information 110) is adjusted in dependence on the temporal resolution adjustment information 122, which temporal resolution adjustment information 122 is provided by the detector 120. Thus, the temporal resolution used by the bandwidth extension information provider 130 (e.g., for providing an updated set of parameters describing the spectral envelope of the high frequency part of the audio content represented by the input audio information 110) is adapted to the input audio information 110.

For example, the audio encoder 100 is configured such that, in response to the detector 120 detecting the onset of the fricative or affricate, the temporal resolution used by the bandwidth extension information provider 130 is increased (compared to the normal temporal resolution, however, the temporal resolution used by the bandwidth extension information provider is increased such that the bandwidth extension information (e.g., the spectrum of the bandwidth extension information includes parameters) is provided at an increased temporal resolution at least for a predetermined time period before the time the onset of the fricative or affricate is detected and for a predetermined time period after the time the onset of the fricative or affricate is detected, thus, the "entire" onset of the fricative or affricate (or at least a sufficient portion of the onset of the fricative or affricate) is encoded at an increased temporal resolution of the bandwidth extension information, thus, the onset of a fricative or affricate can be encoded (and decoded) with sufficient accuracy so that audible artifacts are avoided and degradation of audio quality is also avoided.

Thus, the encoded audio information 112 comprising the bandwidth extension information 132 and typically also the encoded representation 142 of the low frequency part of the audio content represented by the input audio information 110 allows decoding the audio content represented by the input audio information 110 with good quality, while the required bitrate can be maintained reasonably small.

Further, it should be noted that any of the other features and functionalities described herein may also be implemented to the audio encoder 100. In particular, the audio encoder 100 may be additionally configured to adjust the temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided at an increased temporal resolution in response to detecting the end segment of the fricative or affricate (wherein the detector 110 may also be configured to detect the end segment of the fricative or affricate).

Some additional details regarding the functionality of the audio encoder 100 will be described below with reference to fig. 2-7.

FIG. 2 shows a spectral plot of an original speech signal with a known bandwidth extension frame and detected fricative or affricate boundaries.

The abscissa 210 describes time (in terms of time zone) and the ordinate 212 specifies QMF subbands. Thus, the representation 200 according to fig. 2 represents the distribution of audio signal energy over time over different QMF subbands.

As shown, the magenta vertical dashed lines designate the time boundaries 220a,220b, … of known bandwidth extension frames. In addition, the black vertical dashed lines designate detected fricative or affricate boundaries 230a,230b,230c,230d, …. Detected fricative or affricate boundaries 230a,230b,230c,230d, … may be detected using a tilt-based detector. As shown, time intervals of equal length (which may be considered as bandwidth extension frames or generally as frames) are bounded by boundaries 220a, …,220u of (known) bandwidth extension frames. In other words, in the known concept according to document D1, the bandwidth extension information may be associated with temporally regular time intervals (separated by boundaries of known bandwidth extension frames) of equal time length.

As shown, the detected fricative or affricate boundary may be located somewhere within the time interval defined by two subsequent boundaries of a known bandwidth extension frame.

However, the known bandwidth extension frame scheme shown in fig. 2 does not allow for a particularly good reproduction of the high frequency part of the audio content, as will be described later.

Fig. 3 shows a spectral diagram of an original speech signal with an inventive bandwidth extension frame (where the inventive bandwidth extension frame is indicated by a black vertical solid line). The abscissa 310 describes the time in terms of time zone and the ordinate 312 describes the frequency in terms of QMF subbands. The spectral diagram 300 of fig. 3 shows the distribution of the energy (or in general, the intensity) of the audio content (or audio signal) over frequency (or over QMF subbands) and over time. As shown, there are still regular (base or base) frames, indicated by vertical lines 330a-330u, where frames between two subsequent frame boundaries (e.g., between frame boundaries 330a and 330b, or between frame boundaries 330b and 330 c) can be considered to have time intervals of equal length. It should be noted, however, that the temporal resolution is increased in response to detecting the onset of a fricative or affricate and in response to detecting the end of a fricative or affricate. For example, the detection of the onset of a fricative or affricate in the time interval between frame boundaries 330b and 330c functions as follows: the frame (or time interval) between frame boundaries 330b and 330c is subdivided into four sub-frames (or sub-time intervals) 340a, 340b, 340c and 340 d. Furthermore, it should be noted that in response to detecting the onset of a fricative or affricate between frame boundaries 330b and 330c, not only is the temporal resolution increased in the frame between frame boundaries 330b and 330c, but also the temporal resolution is increased in the two subsequent frames bounded by frame boundaries 330c and 330d and frame boundaries 330d and 330 e. Thus, in response to detecting the onset of a fricative or affricate in a single frame (or time interval), i.e., in the time interval bounded by frame boundaries 330b and 330c, increased temporal resolution is applied to the two additional frames (i.e., the frames bounded by frame boundaries 330c and 330d and time boundaries 330d and 330 e). Thus, it may be ensured that the bandwidth extension information (or bandwidth extension parameter) is provided using an increased temporal resolution (compared to a standard temporal resolution) during the duration of the entire onset of the fricative or affricate (or at least a large part of the onset of the fricative or affricate). Accordingly, bandwidth extension alongside a decoder may be performed at increased temporal resolution during the entire beginning segment of a fricative or affricate, as an individual set of bandwidth extension parameters (e.g., parameters describing the envelope of the high frequency portion of the audio content) may be provided for each of the sub-time intervals (e.g., each of the sub-time intervals 340a-340 d). In addition, it can be seen that in response to detecting an end segment of a fricative or affricate in a frame between frame boundaries 330e and 330f, increased temporal resolution is applied to three subsequent frames, i.e., the frames bounded by frame boundaries 330e and 330f, frame boundaries 330f and 343g, and frame boundaries 330g and 330 h. Stated differently, the frame between frame boundaries 330e and 330h is subdivided into four independent sub-frames (or sub-time intervals), with a separate set of bandwidth extension parameters provided for each of the sub-frames (e.g., sub-time intervals). Thus, the bandwidth extension parameter may be provided with increased temporal resolution for the entire end of a detected fricative or affricate in the time interval bounded by frame boundaries 330e and 330 f.

However, between frame boundaries 330h and 330p, a "normal" temporal resolution (rather than an "enhanced" temporal resolution) is used. In addition, in response to detecting a start segment of a fricative or affricate in the frame (or time interval) bounded by frame boundaries 330p and 330q, bandwidth extension information is provided for the frame between frame boundaries 330p and 330s with increased temporal resolution.

Similarly, in response to detecting an end segment of a fricative or affricate in a frame (or time interval) between frame boundaries 330t and 330u, bandwidth extension information is provided with increased temporal resolution for the frame (or time interval) between frame boundaries 330t and 330 w.

In conclusion, the bandwidth extension information is provided in the audio encoder 100 using uniform (base) frames, wherein the bandwidth extension information is associated with temporally regular frames (time intervals) having an equal time length.

However, the bandwidth extension information provider is configured to provide a single set of bandwidth extension information for a frame (i.e., a time interval having a given length of time) when using the first ("normal") time resolution. For example, a single set of bandwidth extension information is provided for frames between frame boundaries 330a and 330b, and a single set of bandwidth extension information is provided for each of the eight frames between time boundaries 330h and 330 p. However, the bandwidth extension information provider is also configured to provide, for frames (time intervals) of a given time length, multiple sets of bandwidth extension information associated with sub-time intervals when using the second ("increased") time resolution. For example, four sets of bandwidth extension information are provided for each of six frames between frame boundary 330b and frame boundary 330h, for each of three frames between frame boundaries 330p and 330s, and for each of three frames between frame boundaries 330t and 330 w. As shown, each of the frames providing bandwidth extension information at high temporal resolution is subdivided into four sub-frames (or sub-time intervals) of equal length (e.g., sub-time intervals 340a-340 d), with one set of bandwidth extension parameters provided for each of the sub-time intervals. Furthermore, it should be noted that immediately before a sub-time frame during which a start segment of a fricative or affricate is detected or before a sub-time frame during which a stop segment of a fricative or affricate is detected, there is typically at least one sub-time frame for which one set of bandwidth extension parameters is provided. For example, if it is assumed that a fricative or affricate is detected in the second half of the frame between frame boundaries 330b and 330c, then at least two sub-time frames (located in the first half of the frame between frame boundaries 330b and 330 c) immediately preceding the sub-time frame during which the fricative or affricate was detected exist. Thus, the bandwidth extension parameter is provided with an increased temporal resolution even before the time when the onset of a fricative or affricate is actually detected or the time when the end of a fricative or affricate is actually detected. Thus, an "all" start segment of a fricative or affricate or an "all" end segment of a fricative or affricate can be processed with a high temporal resolution (wherein the bandwidth extension parameter is provided with a high temporal resolution). Thus, a good reproduction is possible by the side of the audio decoder receiving the encoded audio information provided by the audio encoder 100.

Referring now to fig. 4 and 5, certain advantages of the audio encoder 100 over known audio encoders will be described.

FIG. 4 shows a spectral plot of encoded speech with a known bandwidth extension frame. An abscissa 410 describes time and an ordinate 412 describes frequency. Furthermore, the yellow ellipse indicates typical artifacts caused by known bandwidth extension frames. Thus, the spectral plot 400 of FIG. 4 depicts the distribution of the energy of a speech signal with frequency and with time.

The first ellipse 430 describes the pre-echo caused by the known bandwidth extension frame. Further, the known bandwidth extension frame functions as follows: the starting segment shown in ellipse 430 is considered a very strong starting segment.

In addition, the second ellipse 440 indicates a post-echo, which is also caused by the known bandwidth extension frame. Furthermore, the terminal segments in the area indicated by the ellipse 440 are generally considered to be very strong terminal segments and may sound very unnatural.

The ellipse 450 shows the vowel leakage from the baseband, which is also caused by the known bandwidth extension frame.

Thus, as shown, bandwidth extension frames (e.g., the bandwidth extension frame shown in fig. 2) are known to produce a number of artifacts.

Fig. 5 shows a spectral plot of encoded speech with the inventive bandwidth extension frame (compare to the spectral plot of fig. 4). Also, an abscissa 510 describes time and an ordinate 512 describes frequency, such that the spectrogram 500 represents the energy of an encoded speech signal (or a decoded speech signal derived from an encoded speech signal) as a function of frequency and as a function of time. As shown, the problem areas highlighted by ellipses 430, 440 and 450 are substantially improved, as indicated in FIG. 4. In other words, using a high temporal resolution to provide bandwidth extension information helps to reduce or even avoid undue strong perception of the beginning segment of a pre-echo, fricative or affricate, post-echo of the ending segment of a fricative or affricate, and undue strong perception of the ending segment of a fricative or affricate. Furthermore, the use of increased time resolution in the present invention also helps to avoid vowel leakage from the baseband, as shown at oval 450 in FIG. 4.

Some details regarding providing bandwidth extension information will be explained below with reference to fig. 6 and 7.

Fig. 6 shows a schematic representation of time intervals and sub-time intervals for providing bandwidth extension information.

The time axis is designated 610. As shown, time (represented by time axis 610) is subdivided into time intervals 620a, 620b, 620c, 620d, 620e, and 620f, which may, for example, comprise equal lengths. The time interval may be considered a frame. In addition, the time when the beginning segment (or ending segment) of the fricative or affricate is detected is designated as t_f. Time t_fWithin time interval (or frame) 620 e. It should be noted that the time at which the onset (or the end) of the fricative or affricate is detected may be determined, for example, by the detector 120, and the time at which the onset (or the end) of the fricative or affricate is detected may typically be located shortly after the actual onset of the fricative or affricate or shortly after the actual onset of the end of the fricative or affricate.

As shown in fig. 6, bandwidth extension information is provided at a "normal" (relatively low) resolution for time intervals 620a through 620d and 620 f. For example, one set of bandwidth extension information is provided for each of time intervals 620 a-620 d and 620 f. For example, for each of time intervals 620 a-620 d and 620f, a common spectral shape (or spectral shaping) is represented by a set of bandwidth extension parameters, such that the bandwidth extension information does not represent changes in spectral shape (or spectral shaping) within a single one of time intervals 620 a-620 d and 620 f. In contrast, the audio decoder 100 is configured to adjust the temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided at an increased temporal resolution in the time interval (or frame) 620 e. Thus, in response to detecting the beginning segment (or ending segment) of the fricative or affricate time tf within the time interval 620e, the bandwidth extension information provider 130 may subdivide the time interval 620e into four sub-time intervals 630a-630 d. Thus, for each of the sub-time intervals 630a-630d, the bandwidth extension information provider may provide one set of bandwidth extension information. Thus, a first set of bandwidth extension information (e.g., parameters) provided for a sub-time interval 630a may describe the spectral shape (or spectral shaping) of the bandwidth extension to be applied to the sub-time interval 630a, a second set of bandwidth extension information may describe the spectral shape or spectral shaping of the bandwidth extension to be applied to the sub-time interval 630b, a third set of bandwidth extension information may describe the spectral shape or spectral shaping of the bandwidth extension to be applied to the sub-time interval 630c, and a fourth set of bandwidth extension information may describe the spectral shape or spectral shaping of the bandwidth extension to be applied to the sub-time interval 630 d. Accordingly, the bandwidth extension information provider 130 provides an individual set of bandwidth extension information (or bandwidth extension parameters) such that the spectral shape or spectral shaping applied to the bandwidth extension of the time intervals 630a to 630d is independently signaled. Thus, in response to detecting the onset or the end of a fricative or affricate within time interval 620e, the spectral shape or spectral shaping is encoded at an increased temporal resolution (higher than the "normal" or "low" temporal resolution) for time interval 620 e. It should be noted, however, that time intervals 630a-630d may be of equal length (e.g., in terms of time or in terms of number of samples). Furthermore, it should be noted that the provision of bandwidth extension information with increased temporal resolution has been used in the sub-time interval 630a, i.e. before the time tf at which the start or end segment of a fricative or affricate is detected. In addition, the increased time resolution is also used in the sub-interval 630c, i.e., after the interval 630b during which the onset or the end of the fricative or affricate is detected. Therefore, the beginning or ending segment of the fricative or affricate can be encoded with good audio quality.

Fig. 7 shows another schematic representation of the temporal resolution for providing bandwidth extension information. The time axis is designated 710. As shown, there are time intervals 720a through 720 f. As further shown, the time at which the onset (or the end) of a fricative or affricate is detected is designated tf and is within one of the first quarter of the time interval 720 e. As shown, bandwidth extension information (e.g., one set of bandwidth extension information or one set of bandwidth extension parameters per time interval) is provided at a "normal" or "low" time resolution for time intervals 720a, 720b, 720c, and 720 f. However, in response to detecting the onset of a fricative or affricate at time tf, audio encoder 100 adjusts the temporal resolution used by the bandwidth extension information provider such that an "increased" (or "high") temporal resolution is used during time intervals 720d and 720 e. Thus, a separate set of bandwidth extension information (or bandwidth extension parameters) is provided for the four sub-intervals of interval 720 and the four sub-intervals of interval 720 e. The spectral envelope or spectral envelope shaping to be used for bandwidth extension (at the side of the audio decoder) is thus represented with increased spectral resolution during time intervals 720d and 720 e.

For example, one respective set of bandwidth extension parameters may be provided for each sub-interval of time intervals 720d and 720 e.

It should be noted, however, that the increased temporal resolution is also used for time interval 720d preceding (immediately preceding) time interval 720e, and that the time at which the onset (or the end) of the fricative or affricate is detected is within time interval 720 e. However, as desired, in accordance with the present invention, at least one other time interval (or sub-time interval) preceding the time interval (or sub-time interval) in which the start segment (or end segment) of the fricative or affricate was detected is encoded at an increased time resolution, the audio encoder 100 selects the increased time resolution to provide (and encode) the bandwidth extension information of the time interval 720 d. Thus, since the time at which the start of the fricative or affricate is detected is within the first sub-interval of time interval 720e, the audio decoder determines that (previous) time interval 720d should also be processed with a high temporal resolution such that the high temporal resolution has been applied to the time interval (sub-interval) preceding the sub-interval at which the start (or end) of the fricative or affricate is detected.

Conversely, if the onset (or the end) of a fricative or affricate is detected only in the second sub-interval of time interval 720e, the audio encoder selects a low temporal resolution for time interval 720d (the situation shown in fig. 6) to provide bandwidth extension information (possibly). Thus, as can be appreciated from fig. 7, a particular "temporal look ahead" is performed because the increased temporal resolution is selected to provide bandwidth extension information even in the event that the frame does not require increased temporal resolution.

Thus, even with a high temporal resolution, the start of the onset of the fricative or affricate is processed, wherein the start of the onset of the fricative or affricate is usually located before the time when the detector 120 actually detects the onset of the fricative or affricate. Thus, an audio reproduction with good perceptual quality and without major artifacts can be achieved.

The summary is as follows: fig. 3, 5, 6 and 7 illustrate operational concepts that can be applied to the audio encoder 100 according to the present invention. However, the different frame concepts may actually be used long enough to ensure that the bandwidth extension information is provided at an increased time resolution (compared to the normal time resolution) at least for a predetermined period of time before the time at which the onset of the fricative or affricate (or the end of the fricative or affricate) is detected and for a predetermined period of time after the time at which the onset of the fricative or affricate (or the end of the fricative or affricate) is detected.

It should be noted that fig. 6 and 7, for example, represent the structure of an encoded audio signal. For example, the encoded audio signal may comprise an encoded representation of a low frequency portion of the audio content. Furthermore, the encoded audio representation may comprise a plurality of sets of bandwidth extension parameters.

For example, one set of bandwidth extension parameters may be provided for each of frames 620 a-620 d and 620 f. Further, for one of the frames 720a, 720b, 720c, and 720f, one set of bandwidth extension information may be provided. However, at least for a predetermined time period before the time when the fricative or affricate onset is detected and for a predetermined time period after the time when the fricative or affricate onset is detected, the set of bandwidth extension parameters may be provided with an increased time resolution. For example, for frame 620e, a set of bandwidth extension parameters is provided at an increased temporal resolution. For example, for frame 620e, four sets of aggregate bandwidth extension parameters may be provided such that the temporal resolution in the subframe 630a preceding the subframe 630b in which the start or end segment of the fricative or affricate was detected is increased. Further, two additional sets of bandwidth extension parameters may be provided for subframes 630c and 630 d.

A similar concept can be appreciated from fig. 7, wherein a set of bandwidth extension parameters are provided for frames 620d and 620e at an increased temporal resolution.

As a conclusion, the bandwidth extension parameter may be provided with an improved temporal resolution at least for a predetermined time period before the time when the fricative or affricate onset is detected and for a predetermined time period after the time when the fricative or affricate onset is detected. In addition, bandwidth extension parameters may also be provided with increased temporal resolution for portions of the audio content where a fricative or affricate end segment is detected.

2. Audio encoder according to FIG. 8

Fig. 8 shows a block schematic of an audio encoder according to an embodiment of the invention.

The audio encoder 800 is configured to receive input audio information 810 and to provide encoded audio information 812 based on the input audio information 810.

The audio encoder 800 comprises a detector 820, said detector 820 being configured to detect the end segment of a fricative or affricate. The detector 820 provides, for example, temporal resolution adjustment information 822. Furthermore, the audio encoder 800 comprises a bandwidth extension information provider 830, said bandwidth extension information provider 830 being configured to provide bandwidth extension information 832 using a variable time resolution. The audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider 830 such that the bandwidth extension information 832 is provided at an increased temporal resolution (as compared to the "normal" temporal resolution) in response to detecting the end segment of the fricative or affricate. In other words, if the detector 820 detects the end segment of the fricative or affricate, the time resolution used by the bandwidth extension information provider 830 is increased such that the end segment of the fricative or affricate is encoded at a relatively high (higher than normal) time resolution of the bandwidth extension information (or bandwidth extension parameter) 832. Furthermore, the audio encoder 800 comprises a low frequency encoding means 840, which low frequency encoding means 840 may provide an encoded representation 842 of a low frequency portion of the audio content represented by the input audio information 810.

Furthermore, it should be noted that the detector 820 may be similar to the detector 120 described above, and the bandwidth extension information provider 130 may be similar to (or even identical to) the bandwidth extension information provider 130 described above. Furthermore, the low frequency encoding means 840 are similar or even identical to the low frequency encoding means 140 described above.

In addition, the audio encoder 800 is configured to adjust the temporal resolution used by the bandwidth extension information provider 830 such that the bandwidth extension information 832 is provided at an increased temporal resolution in response to detecting the end segment of the fricative or affricate. Therefore, the end segment of the fricatives or affricates is encoded with a high temporal resolution (of at least the bandwidth extension information), which helps to avoid artifacts and creates a natural auditory sensation.

It should be noted, however, that the audio encoder 800 may alternatively be provided with any of the other features described above with respect to the audio encoder 100 and also with respect to fig. 3, 5, 6 and 7. In addition, the advantage of using increased time resolution in response to detecting the end segment of a fricative or affricate may be as shown in FIG. 5.

Furthermore, it should be noted that the concepts according to fig. 6 and 7 may be applied both in response to detecting an onset segment of a fricative or affricate and in response to detecting an end segment of a fricative or affricate, and may thus also be applied to the audio encoder according to fig. 8.

3. Audio decoder according to FIG. 9

Fig. 9 shows a block schematic diagram of an audio decoder according to an embodiment of the invention. The audio decoder 900 is configured to receive encoded audio information 910 and to provide decoded audio information 912 based on the encoded audio information 910. The audio decoder comprises a low frequency decoding means 920, said low frequency decoding means 920 being configurable to provide a decoded representation of a low frequency portion of the audio content represented by the encoded audio information 910. For example, the low frequency decoding device 920 may include a general purpose audio decoding, for example as described in the international standard ISO/IEC 14496-3. In other words, the low frequency decoding device 920 may, for example, comprise the well-known MPEG-2 "advanced Audio coding" (AAC), and may, for example, decode low frequency portions of audio content up to approximately 6kHz or 7kHz in frequency. However, the low frequency decoding device 920 may use any other decoding concept, such as, for example, the well-known CELP decoding concept or the well-known transform coded excitation (TCX) decoding. In general, the low frequency decoding device 920 may use any general audio decoding concept or any speech decoding concept. The audio decoder 900 also comprises a bandwidth extension means 930, which bandwidth extension means 930 is configured to perform bandwidth extension based on bandwidth extension information 932 provided by the audio encoder and typically comprised in the encoded audio information 910. Bandwidth extension device 930 may generally use information provided by low frequency decoding device 920. For example, the bandwidth extension means 930 may be configured to perform bandwidth replication (SBR) based on the decoded low frequency part of the audio content, wherein the decoded low frequency part of the audio content is provided by the low frequency decoding means 920. For example, the bandwidth extension means 930 may perform the functionality of a so-called "SBR tool" or a so-called "low delay SBR", as for example described in the international standard ISO/IEC 14496-3.

However, the audio decoder 900 may be configured to perform bandwidth extension with increased temporal resolution at least for a predetermined time period before the time when the onset of the fricative or affricate is detected and for a predetermined time period after the time when the onset of the fricative or affricate is detected. Thus, good audio quality can be achieved even for the beginning segment of a fricative or affricate or the ending segment of a fricative or affricate.

It should be noted that the temporal resolution for bandwidth extension may be conveyed using the side information signal included in the bandwidth extension information 932. For example, the signaling may be performed as described in international standard ISO/IEC14496-3, chapter 4.6.19. In particular, the time-resolution signaling may be performed as described in ISO/IEC14496-3, subsection 4, chapter 4.6.19.3.2. Thus, the bandwidth extension means 930 may evaluate the signaling to decide when the inter-resolution should be used for bandwidth extension.

Alternatively, however, the audio decoder may be configured to detect a start segment of a fricative or affricate or an end segment of a fricative or affricate based on the decoded low frequency portion of the audio content that may be provided by the low frequency decoding device 920. Thus, the audio decoder 900 may decide on the temporal resolution to use for bandwidth extension in a manner similar to the audio encoder described above. In such cases, it may not even be necessary to use any additional side information to signal the temporal resolution to be used for bandwidth extension, which helps to reduce the bit rate.

Regarding the functionality of the audio decoder 900, it should be noted that the functionality corresponds to the functionality of the audio encoder 100 according to fig. 1 and the audio encoder 800 according to fig. 8. In other words, bandwidth extension is performed at a "normal" or relatively "low" temporal resolution in the absence of an onset segment of a fricative or affricate or an end segment of a fricative or affricate, and at an "increased" or relatively "high" temporal resolution in the presence of an onset segment of a fricative or affricate or an end segment of a fricative or affricate. However, the bandwidth extension may also be performed with an increased temporal resolution at least for a predetermined time period before the time when the onset of the fricative or affricate is detected and for a predetermined time period after the time when the onset of the fricative or affricate is detected, such that the entire onset of the fricative or affricate is processed with a high temporal resolution of the bandwidth extension. Therefore, artifacts can be avoided.

4. Audio decoder according to FIG. 10

Fig. 10 shows a block schematic diagram of an audio decoder according to another embodiment of the present invention.

The audio decoder 1000 is configured to receive encoded audio information 1010 and to provide decoded audio information 1012 based on the encoded audio information 1010. The audio decoder comprises a low frequency decoding device 1020, which low frequency decoding device 1020 may be substantially identical to the low frequency decoding device 920 described above. The audio decoder 1000 comprises a bandwidth extension means 1030, which bandwidth extension means 1030 may be substantially identical to the bandwidth extension means 930 described above. However, the audio decoder 1000 is configured to perform bandwidth extension based on the bandwidth extension information 1032 provided by the audio encoder such that the bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before the time when the end segment of the fricative or affricate is detected and for a predetermined time period after the time when the end segment of the fricative or affricate is detected. Thus, the audio decoder 1000 provides decoded audio information representing the end segment of the fricative or affricate with good accuracy. Thus, artifacts are avoided.

Furthermore, it should be noted that the explanations provided above with respect to the audio decoder 900 also apply to the audio decoder 1000. Additionally, it should be noted that the audio decoder 1000 may be supplemented with any of the features and functionalities described in relation to the audio decoder 900. Furthermore, the audio decoder 1000 (as well as the audio decoder 900) may be supplemented with any of the features and functionalities described herein with respect to the audio decoder, as the audio decoding corresponds to the audio encoding described above.

5. System according to claim 11

FIG. 11 shows a block schematic of a system according to an embodiment of the invention. The system 1100 comprises an audio encoder 1120, the audio encoder 1120 being configured to receive input audio information 1110 and to provide encoded audio information 1130 to an audio decoder 1140 based on the input audio information 1110. The audio decoder 1140 is configured to provide decoded audio information 1150 based on the encoded audio information 1130.

It should be noted, however, that the audio encoder 1120 may be identical to the audio encoder 100 described with respect to fig. 1 or to the audio encoder 800 described with respect to fig. 8. Further, the audio decoder 1140 may be identical to the audio decoder 900 described with respect to fig. 9 or to the audio decoder 1000 described with respect to fig. 10. Thus, the audio decoder may be configured to receive the encoded audio information provided by the audio encoder and to provide the decoded audio information 1150 on the basis of the encoded audio information such that the bandwidth extension is performed with an increased time resolution at least for a predetermined time period before the time when the start segment of the fricative or affricate is detected and for a predetermined time period after the time when the start segment of the fricative or affricate is detected and/or such that the bandwidth extension is performed with an increased time resolution at least for a predetermined time period before the time when the end segment of the fricative or affricate is detected and for a predetermined time period after the time when the end segment of the fricative or affricate is detected. Thus, a good quality reproduction of fricatives or affricates can be achieved.

It should be noted that the system may be supplemented with any of the features and functionalities described above with respect to the audio encoder and audio decoder.

6. Method for providing encoded audio information based on input audio information according to fig. 12

Fig. 12 shows a flow chart of a method of providing encoded audio information based on input audio information. The method 1200 according to fig. 12 includes detecting an onset segment of a fricative or affricate and/or a termination segment of a fricative or affricate (step 1210). The method also includes providing 1220 bandwidth extension information using variable time resolution. The temporal resolution for providing the bandwidth extension information may, for example, be adjusted such that the bandwidth extension information is provided with an increased temporal resolution at least for a predetermined time period before the time at which the onset of the fricative or affricate is detected and for a predetermined time period after the time at which the onset of the fricative or affricate is detected. Alternatively, the temporal resolution used to provide the bandwidth extension information may be adjusted such that the bandwidth extension information is provided at an increased temporal resolution in response to detecting the end segment of the fricative or affricate.

The method 1200 according to fig. 12 is based on the same considerations as the audio encoder described above. Furthermore, the method 1200 may be supplemented by any of the features and functionalities described herein with respect to the audio encoder (and also with respect to the audio decoder).

7. Method of providing decoded audio information according to request item 13

Fig. 13 illustrates a flowchart of a method of providing decoded audio information according to an embodiment of the present invention. The method 1300 includes decoding 1310 a low frequency portion of the audio information, however this portion is not an important step of the method.

The method 1300 also includes performing 1320 bandwidth extension based on bandwidth extension information provided by the audio encoder, such that bandwidth extension is performed with increased temporal resolution at least for a predetermined time period before a time at which the onset of the fricative or affricate is detected and for a predetermined time period after the time at which the onset of the fricative or affricate is detected, and/or such that bandwidth extension is performed with increased temporal resolution at least for a predetermined time period before a time at which the end of the fricative or affricate is detected and for a predetermined time period after the time at which the end of the fricative or affricate is detected.

The method 1300 is based on the same considerations as the audio encoder and the audio decoder described above. Furthermore, it should be noted that method 1300 may be supplemented with any of the features and functionality described herein with respect to an audio decoder. Furthermore, it should be noted that method 1300 may also be supplemented with any of the features and functionalities described with respect to the audio encoder, allowing for the decoding process to be substantially the opposite of the encoding process.

8. Conclusion

From the above explanation, it is concluded that embodiments according to the present invention are related to speech coding, and in particular to speech coding using bandwidth extension (BWE) techniques. Embodiments in accordance with the present invention aim to enhance the perceptual quality of a decoded signal by detecting fricatives or affricates within the speech signal and adapting the temporal resolution of the bandwidth extension parameter driven post-processing accordingly (e.g. by adapting the temporal resolution used to provide the set of bandwidth extension information). Embodiments according to the present disclosure include detecting a beginning segment and an ending segment of a fricative or affricate signal portion of a speech signal and providing temporally fine-grained bandwidth extension post-processing during the entire beginning segment and ending segment of the fricative or affricate signal portion (where bandwidth extension processing may, for example, include providing the bandwidth extension information at the side of an audio encoder, and may include performing bandwidth extension at the side of an audio decoder). Thereby, the chance of pre-echo and post-echo artifacts is reduced and a sufficiently gentle model of the start and end segments of the fricative or affricate signal portions can be established with fine-grained bandwidth extension parameters. Thereby, the poor hearing sharpness of fricatives or affricates and the occurrence of annoying pre-and post-echoes in the encoded signal are avoided.

Embodiments according to the invention are advantageous over known solutions. For example, [1] proposes to align the start time instant of the bandwidth extension parameter frame with the time point of the spectral tilt change. The spectral tilt change may represent a start segment or a burst end segment of a fricative or affricate signal portion. [1] The alignment technique proposed in (1) prevents the occurrence of pre-echoes of fricatives or affricates within the bandwidth extension method. However, only the fricative or affricate onset segment is detected and the end segment is missed. Additionally, the above-mentioned techniques do not allow for a fine grained modeling of the spectral temporal characteristics of the onset and end segments of individual fricatives or affricates. Therefore, the sounds of the beginning and ending segments of these fricatives or affricates may be harsh and relatively sharp.

Certain embodiments and aspects in accordance with the present disclosure are described below.

For example, the bandwidth extension encoder of the present invention includes a fricative or affricate detector and a bandwidth extension spectrum time resolution switch.

The fricative or affricate detector is preferably capable of detecting the fricative or affricate onset and termination segments. Suitable low computational complexity implementations of such detectors may be based, for example, on Zero Crossing Rate (ZCR) and energy ratio evaluations (see, e.g., references [2] and [3 ]). The detector may additionally be connected to a speech/music discriminator in order to limit the subsequent inventive processing to speech signals only.

In some embodiments, a specific temporal look ahead of the detector is desirable or even required, enabling the bandwidth extension resolution to be switched in time such that fine grain temporal resolution is used in the bandwidth extension parameter estimation/synthesis during the entire start and end segment signal portion lengths. The duration of the signal portion of the start or end segment may be adaptively measured or assumed to be fixed at an empirical decision value. For example, the number of time intervals or sub-time intervals processed at high temporal resolution in response to detecting a fricative or affricate onset segment or fricative or affricate end segment may be predetermined or adjusted depending on signal characteristics. For example, a detected fricative or affricate may enable a four times higher temporal resolution during a group of several consecutive signal frames (e.g., two or three frames) that completely covers the detected fricative or affricate start or stop segment. Preferably, but not necessarily, the group of high temporal resolution signal frames is approximately centered around the detected fricative or affricate onset or end segment, covering the entire duration of the onset or end segment. In the case of an instantaneous adaptive bandwidth extension frame, a higher temporal resolution substitute instantaneous adaptive frame is initiated during the entire group of signal frames triggered by fricative or affricate detection.

Some details regarding the figures will be discussed below.

FIG. 2 shows a spectral plot of an original speech signal, with magenta vertical dashed bars depicting known bandwidth extension frames. The black dashed lines indicate fricative or affricate boundaries.

Fig. 3 shows a spectral diagram of an original speech signal with the inventive bandwidth extension of frames adapted to fricative or affricate boundaries indicated by solid black vertical lines. At the point in time when a fricative or affricate boundary (start or stop segment) has been detected, the resolution of the bandwidth extension post-processing is refined by switching to four times higher resolution during a group of three consecutive frames.

Fig. 4 depicts the resulting spectral plot of the same speech signal encoded using a known bandwidth extension frame. The yellow ellipse indicates the artifact caused by the known bandwidth extension frame (from left to right): a: pre-echo and strong initial segment; b: post-echo and strong termination segments; c: energy leakage from the previous vowel to the modeled fricative or affricate due to too coarse frames.

Fig. 5 depicts the resulting spectral plot of the same speech signal encoded using the bandwidth extended frame of the present invention. The problem area indicated in fig. 4 is substantially improved.

In conclusion, the spectrogram discussed herein indicates that audio quality can be substantially improved by applying the concept according to the present invention.

Further concluding, an audio encoder, or an audio encoding method, or an associated computer program is created according to embodiments of the invention, as described above.

Further embodiments according to the invention create an audio decoder, or an audio decoding method, or an associated computer program, as described above.

Furthermore, an encoded audio signal or a storage medium having an encoded audio signal stored thereon is created according to an embodiment of the present invention, as described above.

9. Implementation alternatives

Although some aspects have been described in the context of a device, it should be clear that these aspects also represent a description of the corresponding method, wherein a block or means corresponds to a method step or a feature of a method step. Similarly, aspects described in the context of a method step also represent features of the corresponding block or item or the corresponding device. Some or all of the method steps may be performed by (or using) a hardware device, such as a microprocessor, programmable computer or electronic circuitry. In some embodiments, some or more of the most important method steps may be performed by such an apparatus.

The encoded audio signals of the present invention may be stored on a digital storage medium or may be transmitted over a transmission medium, such as a wireless transmission medium or a wired transmission medium, such as the internet.

Embodiments of the invention may be implemented in hardware or software, depending on the particular implementation requirements. Implementations may be performed using digital storage media, such as floppy, DVD, Blu-ray, CD, ROM, PROM, EPROM, EEPROM, or FLASH memory having electronically readable control signals stored thereon which cooperate (or are capable of cooperating) with a programmable computer system such that a respective method is performed. Accordingly, the digital storage medium may be computer readable.

Some embodiments according to the invention comprise a data carrier with electronically readable control signals capable of cooperating with a programmable computer system in order to perform one of the methods described herein.

In general, embodiments of the invention can be implemented as a computer program product with program code operable to perform one of the above-described methods when the computer program product is run on a computer. The program code may for example be stored on a machine readable carrier.

Other embodiments include a computer program stored on a machine-readable carrier for performing one of the methods described herein.

In other words, an embodiment of the inventive methods is therefore a computer program with a program code for performing one of the methods described herein, when the computer program runs on a computer.

Thus, another embodiment of the inventive method is a data carrier (or digital storage medium or computer readable medium) comprising a computer program recorded thereon for performing one of the methods described herein. The data carrier, digital storage medium or recording medium is typically tangible and/or non-volatile.

Thus, another embodiment of the inventive method is a data stream or signal sequence representing a computer program for performing one of the methods described herein. The data stream or signal sequence may for example be arranged to be communicated via a communication connection, for example via the internet.

Another embodiment comprises a processing means, such as a computer or programmable logic device, configured or adapted to perform one of the methods described herein.

Another embodiment comprises a computer having a computer program installed thereon for performing one of the methods described herein.

Another embodiment according to the present invention comprises an apparatus or system configured to transfer (e.g., electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may be, for example, a computer, a mobile device, a memory device, or the like. The device or system may for example comprise a file server for delivering the computer program to the receiver.

In certain embodiments, a programmable logic device (e.g., a field programmable gate array) may be used to perform some or all of the functionality of the methods described herein. In certain embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. In general, the method is preferably performed by any hardware device.

The devices described herein may be implemented using hardware devices or using a computer or using a combination of hardware devices and a computer.

The methods described herein may be performed using a hardware device or using a computer or using a combination of a hardware device and a computer.

The embodiments described above are merely illustrative of the principles of the invention. It is understood that modifications and variations of the arrangements and details described herein will be apparent to others skilled in the art. Therefore, it is intended that the invention be limited only to the extent required by the pending patent application and not to the particular details shown by way of illustration and description of the embodiments herein.

1. An audio encoder (100) for providing an encoded audio information (112) based on an input audio information (112), the audio encoder comprising:

a bandwidth extension information provider (130) configured to provide bandwidth extension information (132) using a variable time resolution;

a detector (120) configured to detect an onset of a fricative or affricate;

wherein the audio encoder is configured to adjust a temporal resolution used by the bandwidth extension information provider such that bandwidth extension information is provided with an increased temporal resolution at least for a predetermined time period (630a) before a time (tf) at which an onset of a fricative or affricate is detected and for a predetermined time period (630c) after the time at which the onset of the fricative or affricate is detected.

2. The audio encoder (100) according to embodiment 1, wherein the audio encoder is configured to switch from a first temporal resolution for the providing the bandwidth extension information to a second temporal resolution for the providing the bandwidth extension information in response to the detection of the onset of a fricative or affricate,

wherein the second temporal resolution is higher than the first temporal resolution.

3. The audio encoder (100) according to embodiment 1 or 2, wherein the bandwidth extension information provider is configured to provide the bandwidth extension information such that the bandwidth extension information is associated with temporally regular time intervals (620a, 620b, 620c, 620d, 620e, 620 f; 720a-720f) of equal time length,

4. The audio encoder (100) according to embodiment 3, wherein the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that at least one sub-time interval (630 a; 730d) associated with one set of bandwidth extension information immediately precedes another sub-time interval (630 b; 730e) associated with another set of bandwidth extension information and during which (630 b; 730e) a start segment of a fricative or affricate is detected,

5. The audio encoder (100) according to embodiment 3 or 4, wherein the audio encoder is configured to subdivide a given time interval (620 e; 720d, 720e) having the given time length into four sub-time intervals (630a-630 d; 730a-730h) having equal lengths if the bandwidth extension information is provided with an increased temporal resolution for the given time interval (620 e; 720d, 720e) having the given time length,

such that four sets of bandwidth extension information are provided for the given time interval having the given length of time.

6. The audio encoder (100) of any of embodiments 1 to 5,

7. The audio encoder (100) of any of embodiments 1 to 6,

wherein the audio encoder is configured to perform time-look-ahead such that, in response to detecting a start segment of a fricative or affricate within the second time interval (720e), bandwidth extension information is provided with an increased temporal resolution for a first time interval (720d) of a given time length preceding the second time interval (720e) of the given time length.

8. The audio encoder (100) of any of embodiments 1 to 7,

wherein the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that bandwidth extension information is provided with the same increased temporal resolution at least for a predetermined time period (630 a; 730d) preceding a time (tf) at which a start segment of a fricative or affricate is detected and for a predetermined time period (630 c; 730f) following the time at which the start segment of the fricative or affricate is detected.

9. The audio encoder (100) of any of embodiments 1 to 8,

wherein the first sub-time interval immediately precedes the second sub-time interval;

wherein, the initial segment of fricative sound or affricate sound is detected in the second sub-time interval; and

wherein the third sub-interval immediately follows the second sub-interval.

10. The audio encoder (100) of any of embodiments 1 to 9,

wherein the detector is configured to detect a fricative or affricate termination segment; and

11. The audio encoder (100) according to any of embodiments 1 to 10, wherein the detector is configured to evaluate a zero crossing rate, and/or an energy ratio, and/or a spectral tilt, in order to detect an onset of a fricative or affricate.

12. The audio encoder (100) according to any of embodiments 1 to 11, wherein the detector is configured to evaluate a zero crossing rate, and/or an energy ratio, and/or a spectral tilt, in order to detect a terminating segment of a fricative or affricate.

13. The audio encoder (100) of any of embodiments 1 to 12, wherein the audio encoder is configured to selectively adjust a temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided with an increased temporal resolution in response to detecting an onset of a fricative or affricate only for speech signal portions and not for music signal portions.

14. The audio encoder (100) according to any of embodiments 1 to 13, wherein the audio encoder is configured to selectively provide the bandwidth extension information with an increased temporal resolution for a plurality of subsequent time intervals covering a time at which the onset of the fricative or affricate is detected, in response to detecting the onset of the fricative or affricate or in response to detecting the end of the fricative or affricate.

15. The audio encoder (100) according to embodiment 14, wherein the audio encoder is configured to selectively provide the bandwidth extension information with an increased temporal resolution for a plurality of subsequent time intervals that completely cover an onset of the detected fricative or affricate.

16. An audio encoder (800) for providing encoded audio information (812) based on input audio information (810), the audio encoder comprising:

a bandwidth extension information provider (830) configured to provide bandwidth extension information (832) using a variable time resolution;

a detector (820) configured to detect an end segment of a fricative or affricate;

17. The audio encoder (800) of embodiment 16,

18. An audio decoder (900) for providing a decoded audio information (912) on the basis of an encoded audio information (910),

wherein the audio decoder (900) is configured to perform bandwidth extension based on bandwidth extension information (932) provided by an audio encoder,

such that the bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before a time at which an onset of a fricative or affricate is detected and for a predetermined time period after the time at which the onset of the fricative or affricate is detected.

19. An audio decoder (1000) for providing a decoded audio information (1012) on the basis of an encoded audio information (1010),

wherein the audio decoder is configured to perform bandwidth extension (1030) based on bandwidth extension information (1032) provided by the audio encoder,

such that the bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before a time at which an end segment of a fricative or affricate is detected and for a predetermined time period after the time at which the end segment of the fricative or affricate is detected.

20. A system (1100), comprising:

an audio encoder (1120) as requesting one of items 1 to 17; and

an audio decoder (1140) configured to receive the encoded audio information (1130) provided by the audio encoder and to provide decoded audio information (1150) based on the encoded audio information,

wherein the audio decoder is configured to perform bandwidth extension based on the bandwidth extension information provided by the audio encoder,

such that said bandwidth extension is performed with an increased time resolution at least for a predetermined time period before a time at which an onset of a fricative or affricate is detected and for a predetermined time period after said time at which said onset of said fricative or affricate is detected, or

Such that the bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before a time at which an end segment of a fricative or affricate is detected and for a predetermined time period after the time at which the end segment of the fricative or affricate is detected.

21. A method (1200) of providing encoded audio information based on input audio information, the method comprising:

providing (1220) bandwidth extension information using variable time resolution; and

detecting (1210) an initial segment of a fricative or affricate;

wherein a temporal resolution for providing the bandwidth extension information is adjusted such that bandwidth extension information is provided with an increased temporal resolution at least for a predetermined time period before a time at which an onset of a fricative or affricate is detected and for a predetermined time period after the time at which the onset of the fricative or affricate is detected.

22. A method (1200) of providing encoded audio information based on input audio information, the method comprising:

providing (1220) bandwidth extension information using variable time resolution; and

detecting (1210) a termination segment of a fricative or affricate;

wherein the temporal resolution for providing the bandwidth extension information is adjusted such that in response to detecting a fricative or affricate end segment, bandwidth extension information is provided at an increased temporal resolution.

23. A method (1300) of providing decoded audio information based on encoded audio information,

wherein the method comprises performing (1320) bandwidth extension based on bandwidth extension information provided by an audio encoder,

24. A method (1300) of providing decoded audio information based on encoded audio information,

wherein the method comprises performing (1320) bandwidth extension based on bandwidth extension information provided by an audio encoder,

25. A computer program for performing the method according to one of embodiments 21 to 24 when the computer program runs on a computer.

26. An encoded audio signal comprising:

an encoded representation of a low frequency portion of the audio content; and

a plurality of sets of bandwidth extension parameters;

wherein the bandwidth extension parameter is provided with an increased temporal resolution at least for a predetermined time period before a time of an onset of a fricative or affricate present in the audio content and for a predetermined time period after the time of the onset of the fricative or affricate present in the audio content.

27. An encoded audio signal comprising:

an encoded representation of a low frequency portion of the audio content; and

a plurality of sets of bandwidth extension parameters;

wherein the bandwidth extension parameter is provided with an increased temporal resolution in a time portion of an end segment in which a fricative or affricate is present in the audio content.

Reference documents:

[1] U.S. Pat. No. US 20110099018, "apparatus and method for calculating bandwidth extension data using spectrum tilt controlled frames"

[2] Ruinsky and n.dadush and y.lavner, "systems based on spectral and textural features for automatic detection of fricatives and affricates", IEEE 26 th institute of electrical and electronic engineers (IEEE i), israel, p 771-775, 2010.

[3] Fujihara and m.goto, "three techniques for improving the automatic synchronization between music and lyrics: fricative detection, fill models, and new feature vectors for vocal cord activity detection ", IEEE international congress for audio, speech, and signal processing, chicago, usa, 2008.

40页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：基于多种特征融合的语音篡改检测方法

Audio encoder

相关技术

网友询问留言