Audio encoder
阅读说明:本技术 音频编码器 (Audio encoder ) 是由 萨沙·迪施 克里斯蒂安·赫尔姆里希 马库斯·穆赖特鲁斯 马库斯·施内尔 阿瑟·特里特哈特 于 2014-01-28 设计创作,主要内容包括:本公开涉及音频编码器。该基于输入音频信息提供编码音频信息的音频编码器,包括:带宽扩展信息提供器,配置为使用可变时间分辨率提供带宽扩展信息,及侦测器,配置为侦测摩擦音或破擦音的起始段。音频编码器配置为调整带宽扩展信息提供器所使用的时间分辨率,以使得至少针对侦测到摩擦音或破擦音的起始段的时间的之前的预定时间周期及针对侦测到摩擦音或破擦音的起始段的时间的之后的预定时间周期,以提高的时间分辨率提供带宽扩展信息。可选地或者此外,响应于侦测到摩擦音或破擦音的终止段,以提高的时间分辨率提供带宽扩展信息。音频编码器及方法使用对应的概念。(The present disclosure relates to audio encoders. The audio encoder for providing encoded audio information based on input audio information, comprising: a bandwidth extension information provider configured to provide bandwidth extension information using a variable time resolution, and a detector configured to detect an onset of a fricative or affricate. The audio encoder is configured to adjust a temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided with an increased temporal resolution at least for a predetermined time period before a time when the onset of the fricative or affricate is detected and for a predetermined time period after the time when the onset of the fricative or affricate is detected. Alternatively or additionally, bandwidth extension information is provided at an increased temporal resolution in response to detecting a fricative or affricate end segment. Audio encoders and methods use corresponding concepts.)
1. An audio encoder (100) for providing an encoded audio information (112) based on an input audio information (112), the audio encoder comprising:
a bandwidth extension information provider (130) configured to provide bandwidth extension information (132) using a variable time resolution;
a detector (120) configured to detect an onset of a fricative or affricate;
wherein the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that at least the time (t) for detecting the onset of a fricative or affricate soundf) And a predetermined time period (630c) after the time for detecting the onset of the fricative or affricate, providing bandwidth extension information with increased time resolution.
2. The audio encoder (100) of claim 1, wherein the audio encoder is configured to switch from a first temporal resolution for the providing the bandwidth extension information to a second temporal resolution for the providing the bandwidth extension information in response to the detection of the onset of a fricative or affricate,
wherein the second temporal resolution is higher than the first temporal resolution.
3. The audio encoder (100) of claim 1, wherein the bandwidth extension information provider is configured to provide the bandwidth extension information such that the bandwidth extension information is associated with temporally regular time intervals (620a, 620b, 620c, 620d, 620e, 620 f; 720a-720f) of equal time length,
wherein the bandwidth extension information provider is configured to provide a single set of bandwidth extension information for a time interval (620a, 620b, 620c, 620d, 620 f; 720a, 720b, 720c, 720f) of a given length of time if a first temporal resolution is used, and
wherein the bandwidth extension information provider is configured to provide a plurality of sets of bandwidth extension information associated with sub-time intervals (630a, 630b, 630c, 630d) for a time interval (620 e; 720d, 720e) of the given length of time, if a second time resolution is used.
4. The audio encoder (100) of claim 3, wherein the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that at least one sub-time interval (630 a; 730d) associated with one set of bandwidth extension information immediately precedes another sub-time interval (630 b; 730e) associated with another set of bandwidth extension information and during which (630 b; 730e) a start segment of a fricative or affricate is detected,
such that the increased temporal resolution is used in at least one sub-time interval (630 a; 730d) preceding the sub-time interval (630 b; 730e) in which the onset of a fricative or affricate is detected.
5. The audio encoder (100) of claim 3, wherein the audio encoder is configured to subdivide a given time interval (620 e; 720d, 720e) of a given length of time into four sub-time intervals (630a-630 d; 730a-730h) of equal length, if the bandwidth extension information is provided using an increased temporal resolution for the given time interval (620 e; 720d, 720e) of the given length of time,
such that four sets of bandwidth extension information are provided for the given time interval having the given length of time.
6. The audio encoder (100) of claim 1,
wherein the audio encoder is configured to selectively provide bandwidth extension information using an increased temporal resolution for a first time interval (720d) of a given length of time preceding a second time interval (720e) of a given length of time,
if a fricative or affricate onset is detected within the second time interval (720e) and if a time of the fricative or affricate onset is detected, a time distance between a time of the fricative or affricate onset and a boundary between the first time interval (720d) and the second time interval (720e) is less than a predetermined time distance.
7. The audio encoder (100) of claim 6,
wherein the audio encoder is configured to perform time-look-ahead such that, in response to detecting a start segment of a fricative or affricate within the second time interval (720e), bandwidth extension information is provided with an increased temporal resolution for a first time interval (720d) of a given length of time preceding the second time interval (720e) of the given length of time.
8. The audio encoder (100) of claim 1,
wherein the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that at least the time (t) for detecting the onset of a fricative or affricate soundf) Of the predetermined time period (630 a; 730d) and a predetermined time period (630 c; 730f) the bandwidth extension information is provided with the same improved temporal resolution.
9. The audio encoder (100) of claim 1,
wherein the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that a set of bandwidth extension information is provided with the same increased temporal resolution for at least a first sub-time interval (630 a; 730d), a second sub-time interval (630 b; 730e) and a third sub-time interval (630 c; 730f),
wherein the first sub-time interval immediately precedes the second sub-time interval;
wherein, the initial segment of fricative sound or affricate sound is detected in the second sub-time interval; and
wherein the third sub-interval immediately follows the second sub-interval.
10. The audio encoder (100) of claim 1,
wherein the detector is configured to detect a fricative or affricate termination segment; and
wherein the audio encoder is configured to adjust a temporal resolution used by the bandwidth extension information provider such that bandwidth extension information is provided with an increased temporal resolution at least for a predetermined time period before a time at which an end segment of a fricative or affricate is detected and for a predetermined time period after the time at which the end segment of the fricative or affricate is detected.
Technical Field
Embodiments according to the present invention relate to an audio encoder for providing encoded audio information based on input audio information.
Other embodiments according to the present invention are directed to an audio decoder for providing decoded audio information based on encoded audio information.
Other embodiments according to the present invention are directed to a system comprising an audio encoder and an audio decoder.
Other embodiments according to the present invention are directed to a method of providing encoded audio information based on input audio information.
Other embodiments according to the present invention are directed to a method of providing decoded audio information based on encoded audio information.
Further embodiments according to the invention relate to a computer program for performing one of the methods.
Other embodiments according to the invention are directed to modeling the beginning or ending segments of fricatives or affricates in audio bandwidth extension for speech.
Background
In recent years, the demand for digital storage and transmission of audio signals, particularly voice signals, has increased. In some cases, like for example mobile communication applications, it is required to obtain a relatively low bit rate.
However, in order to obtain a good balance between bit rate and audio quality (or speech quality), there are methods to encode the low frequency part of an audio signal (e.g. the frequency part up to approximately 6 kHz) with relatively high precision and to reconstruct the high frequency part of the audio content (e.g. the frequency part above approximately 6kHz or 7 kHz) depending on the bandwidth extension. For example, bandwidth extension may be based on reconstructing the high frequency part of the audio content using relatively few parameters, wherein the parameters may describe the spectral envelope, e.g. in a coarse manner.
A well-known implementation of bandwidth extension is bandwidth replication (SBR), which has been standardized in MPEG (moving picture experts group).
For example, some details regarding bandwidth replication are described in International Standard ISO/IEC 14496-3:200X (E) subsection 4 in chapters 4.6.18 and 4.6.19.
In addition, reference is also made to patent application No. US 2011/0099018 a1, which describes an apparatus and method for calculating bandwidth extension data using spectral tilt controlled framing. Said patent application describes an apparatus for calculating bandwidth extension data of an audio signal in a bandwidth extension system, wherein a first bandwidth is encoded with a first number of bits and a second bandwidth, different from the first bandwidth, is encoded with a second number of bits, the second number of bits being smaller than the first number of bits. The device has a controllable bandwidth extension parameter calculator that calculates bandwidth extension parameters of the second bandwidth in a frame-by-frame manner for a first sequence of frames of the audio signal. Each frame has a controllable start time instant. The apparatus additionally comprises a spectral tilt detector that detects a spectral tilt in a temporal portion of the audio signal and signals a start time instant of an individual frame of the audio signal depending on the spectral tilt.
However, it has been found that in many known methods of bandwidth extension, the auditory effect obtained in the presence of fricatives or affricates is substantially degraded. For example, known bandwidth extension techniques may cause pre-echo and post-echo. Furthermore, fricatives or affricates may sound too sharp when using known bandwidth extension techniques.
In view of the above, there is a need to create a bandwidth extension concept that allows for improved audio quality.
Disclosure of Invention
Embodiments in accordance with the present invention create an audio encoder that provides encoded audio information based on input audio information. The audio encoder comprises a bandwidth extension information provider configured to provide bandwidth extension information using a variable time resolution. The audio encoder also includes a detector configured to detect an onset of a fricative or affricate. The audio encoder is configured to adjust a temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided with an increased temporal resolution at least for a predetermined time period before a time when the onset of the fricative or affricate is detected and for a predetermined time period after the time when the onset of the fricative or affricate is detected.
This embodiment according to the invention is based on the finding that good hearing quality can be achieved if the bandwidth extension information is provided with a high temporal resolution for the entire environment of the time at which the onset of the fricative or affricate is detected. Thus, the entire onset of a fricative or affricate is encoded with a high temporal resolution (at least with respect to bandwidth extension information), which typically comprises a certain temporal extension before the time at which the onset of the fricative or affricate is detected and a certain period (temporal extension) after the time at which the onset of the fricative or affricate is actually detected, thereby helping to avoid pre-echo and also helping to avoid unnatural auditory sensations. In general, the onset of a fricative or affricate cannot be detected very accurately, since the detection of the onset of a fricative or affricate is often based on the detection of a critical crossing, which apparently does not occur right at the beginning of the onset of a fricative or affricate. Thus, the onset of a fricative or affricate is (actually) detected at a time just after the onset (or onset) of the fricative or affricate. Thus, by ensuring that the bandwidth extension information is provided with an increased time resolution (compared to the "normal" time resolution) at least for a predetermined period of time preceding the time at which the onset of the fricative or affricate is (actually) detected: details just beginning with the beginning of a fricative or affricate can also be reproduced with good resolution, wherein it has been found that such details even just beginning with the beginning of a fricative or affricate are important for a good auditory sensation. Thus, by providing bandwidth extension information with increased time resolution at least for a predetermined time period prior to the time at which the onset of the fricative or affricate is detected, not only is it helpful to avoid pre-echoes, but it also enables the details of the onset of the fricative or affricate to be reproduced. Similarly, by ensuring a predetermined time period after the time for detecting the onset of a fricative or affricate, bandwidth extension information is provided with increased temporal resolution, enabling the reproduction of details of the onset of the fricative or affricate, such details being important for the hearing perception.
The concept described herein thus enables the reproduction of the entire onset of a fricative or affricate with a high temporal resolution, which helps to avoid a deterioration of the hearing sensation, which is caused for example by a temporal resolution that is too coarse (of the bandwidth extension information) just at the beginning of the onset of the fricative or affricate or at the transition from the onset of the fricative or affricate to the stationary signal part.
In a preferred embodiment, the audio encoder is configured to switch from a first temporal resolution for providing the bandwidth extension information to a second temporal resolution for providing the bandwidth extension information in response to detecting an onset of a fricative or affricate, wherein the second temporal resolution is higher than the first temporal resolution. Thus, a switch between two different time resolutions for providing bandwidth extension information is performed, wherein the switch is controlled by detecting an onset of a fricative or affricate. Thus, a simple control scheme is created, which can be easily implemented in an audio encoder or audio decoder.
In a preferred embodiment, the bandwidth extension information provider is configured to provide the bandwidth extension information such that the bandwidth extension information is associated with temporally regular time intervals (which may form a basic but sub-divisible time grid for providing the bandwidth extension information) of equal time length. The bandwidth extension information provider is configured to provide a single set of bandwidth extension information for a time interval having a given length in time when a first temporal resolution (e.g., a relatively lower temporal resolution) is used. Furthermore, the bandwidth extension information provider may be configured to provide a plurality of sets of bandwidth extension information associated with sub-time intervals for a time interval having a given length of time when using the second time resolution (e.g. a relatively higher time resolution).
An audio encoder may be easily implemented by using temporally regular time intervals (e.g. frames) of equal time length as a (basic) time grid for providing bandwidth extension information. For example, the bandwidth extension information provider only needs to switch between two discrete time resolutions, which can be implemented without excessive effort. For example, the bandwidth extension information provider may only need to be implemented to provide a single set of bandwidth extension information based on a time interval of a given length of time, and to provide multiple sets of bandwidth extension information based on a predetermined (and fixed) number of sub-intervals of equal length of the time interval of the given length of time. Thus, the following may be sufficient, for example: the bandwidth extension information provider is configured to provide a single set of bandwidth extension information based on a time interval having a given length of time, or four sets of bandwidth extension information based on four sub-time intervals, each of the sub-time intervals having a length equal to one quarter of the given length of time. Furthermore, by using such concepts, the signaling workload that may be required for signaling during time intervals in which bandwidth extension information is provided may be kept small, since it is only necessary to select between "coarse resolution" (e.g., a single set of bandwidth extension information for a time interval having a given length in time) and "fine resolution" (e.g., n sets of bandwidth extension information associated with n sub-time intervals having equal lengths). Thus, a specific efficient concept for providing bandwidth extension information is provided.
In a preferred embodiment, the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that at least one sub-time interval associated with one set of bandwidth extension information immediately precedes another sub-time interval associated with another set of bandwidth extension information and during which the onset of the fricative or affricate is detected, such that the increased temporal resolution is used in the at least one sub-time interval preceding the sub-time interval in which the onset of the fricative or affricate is detected. It is thus possible to provide the bandwidth extension information at a high temporal resolution even just before the beginning of the start segment of the fricative or affricate, i.e. even before the start segment of the fricative or affricate can actually be detected.
In a preferred embodiment, the audio encoder is configured to subdivide a given time interval having a given length of time into four sub-time intervals of equal length if the bandwidth extension information is provided using an increased temporal resolution for the given time interval having the given length of time, such that four sets of bandwidth extension information (e.g. four sets of bandwidth extension parameters, each set being associated with one of the sub-time intervals) are provided for the given time interval having the given length of time. Thus, a high temporal resolution of the bandwidth extension information may be achieved, since the four sets of bandwidth extension information may independently describe the envelope of the high frequency signal portion of the audio content, e.g. for four sub-intervals. Thus, the difference in spectral envelopes of the high frequency signal portions of the four sub-time intervals may be considered, as each of the sets of bandwidth extension information may represent a frequency envelope (or spectral envelope) of the high frequency portion of one of the sub-time intervals.
In a preferred embodiment, the audio encoder is configured to selectively provide the bandwidth extension information with an increased temporal resolution for a first time interval of a given length of time preceding a second time interval of a given length of time if a fricative or affricate onset is detected within the second time interval and if a temporal distance between a time at which the fricative or affricate onset is detected and a boundary between the first time interval and the second time interval is less than a predetermined temporal distance. Thus, even in case the time at which the onset of the fricative or affricate is detected is within a subsequent second time interval (e.g. a subsequent second frame), the bandwidth extension information of the first time interval (e.g. the first frame) is provided with an increased time resolution (compared to the "normal" time resolution) if it is assumed that the onset of the fricative or affricate is located within the first time interval just at the beginning (typically located before the time at which the onset of the fricative or affricate is actually detected). Thus, the entire onset of a fricative or affricate, including the amount of time just before the onset of the fricative or affricate and possibly even the onset of the fricative or affricate, is evaluated for which a high temporal resolution is used in providing the bandwidth extension information, resulting in good speech reproduction. Rather than just avoiding pre-echoes, the onset of fricatives or affricates can be accurately reproduced without excessive sharpness or other substantial artifacts.
In a preferred embodiment, the audio encoder is configured to run-time look-ahead such that in response to detecting an onset of a fricative or affricate within the second time interval, the bandwidth extension information is provided with an increased temporal resolution for a first time interval of a given length of time preceding the second time interval of the given length of time. Thus, it is possible to provide bandwidth extension information with an increased temporal resolution for the entire onset of a fricative or affricate (and possibly even a short time period before the onset of a fricative or affricate), resulting in an improved audio quality.
In a preferred embodiment, the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided with the same increased temporal resolution at least for a predetermined time period before the time when the onset of the fricative or affricate is detected and for a predetermined time period after the time when the onset of the fricative or affricate is detected. By using equal time resolutions, the provision of bandwidth extension information is simplified compared to the case where different time resolutions are used before and after the time at which the onset of the fricative or affricate is detected. Furthermore, by using the same increased time resolution for a predetermined time period before the time when the onset of the fricative or affricate is detected and for a predetermined time period after the time when the onset of the fricative or affricate is detected, the signaling workload is reduced.
In a preferred embodiment, the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that the set of bandwidth extension information is provided with the same increased temporal resolution for at least a first sub-interval, a second sub-interval and a third sub-interval, wherein the first sub-interval immediately precedes the second sub-interval, wherein a start of a fricative or affricate is detected within the second sub-interval, and wherein the third sub-interval immediately follows the second sub-interval. Thus, when providing the set of bandwidth extension information, the first and third sub-intervals of the second sub-interval "embedded" with the start segment during which the fricative or affricate was detected are processed with the same temporal resolution. Thus, when providing bandwidth extension information, a substantial part of the onset of a fricative or affricate, or even the entire onset of a fricative or affricate, is treated with a high temporal resolution. Furthermore, by using the same (increased, or "high") temporal resolution for the first, second and third sub-time intervals, encoding and decoding becomes simple and the signaling management burden (for signaling temporal resolution) becomes small.
In a preferred embodiment, the detector is configured to detect the end segment of a fricative or affricate sound. In this case, the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided with an increased temporal resolution at least for a predetermined time period before the time when the end segment of the fricative or affricate is detected and for a predetermined time period after the time when the end segment of the fricative or affricate is detected. This embodiment according to the invention is based on the finding that also for the terminating segment of a fricative or affricate, the bandwidth extension should be performed with a high temporal resolution. It has been found that human hearing is actually sensitive to the end segment of fricatives or affricates, and it is therefore worthwhile to expend the bitrate management burden to encode the end segment of fricatives or affricates with high temporal resolution (with respect to bandwidth extension information). Furthermore, it has been found that providing bandwidth extension information at a low temporal resolution during the end segment of a fricative or affricate sound often results in an unduly sharp auditory sensation during the end segment of the fricative or affricate sound, which sensation is perceived as an artifact.
Furthermore, it should be noted that with respect to adjusting the temporal resolution used by the bandwidth extension information provider in response to the onset of a fricative or affricate, any of the above-mentioned concepts may also be advantageously applied in response to detecting the end of a fricative or affricate. In other words, the concepts described above may be applied in a similar manner, wherein "the terminating segment of the fricative or affricate" replaces "the initiating segment of the fricative or affricate".
In a preferred embodiment, the detector is configured to evaluate the zero crossing rate, and/or the energy ratio and/or the spectral tilt, in order to detect the onset of fricatives or affricates. It has been found that evaluation of one or more of the above mentioned quantities (zero-crossing rate, energy ratio, spectral tilt) enables reasonably accurate detection of the onset of fricatives or affricates. For example, one or more of the above-mentioned values, or a value derived from a combination of the above-mentioned quantities, may be compared to a threshold value in order to detect the presence of a fricative or affricate.
In a preferred embodiment, the encoder is configured to selectively adjust the temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided at an increased temporal resolution in response to detecting the onset of a fricative or affricate for only the speech signal portions and not the music signal portions. This concept is based on the finding that fricatives or affricates are more important to the perception of speech than the perception of music signal parts. Thus, for music signal parts, the bitrate management burden that can be incurred by providing bandwidth extension information with increased temporal resolution can be avoided, which helps to reduce the overall bitrate, or helps to focus on the coding of perceptually more important features for music signal parts.
In a preferred embodiment, the audio encoder is configured to selectively provide the bandwidth extension information with an increased temporal resolution for a plurality of subsequent time intervals that completely cover the beginning segment of the detected fricative or affricate. Therefore, even when bandwidth extension is used, the initial segment of a fricative or affricate is encoded with high precision so that the auditory sensation is not substantially deteriorated using bandwidth extension.
According to another embodiment of the present invention an audio encoder for providing encoded audio information on the basis of input audio information is created. The audio encoder comprises a bandwidth extension information provider configured to provide bandwidth extension information using a variable time resolution. The audio encoder also includes a detector configured to detect an end segment of a fricative or affricate. The audio encoder is configured to adjust a temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided at an increased temporal resolution in response to detecting the end segment of the fricative or affricate.
This embodiment according to the invention is based on the finding that the end segment of a fricative or affricate is also important for the perception of the audio content and should therefore be encoded with a high temporal resolution. In particular, this embodiment according to the present invention is based on the finding that if the end segment of a fricative or affricate is encoded with insufficient temporal resolution of the bandwidth extension information, the end segment of the fricative or affricate is generally considered to be "too sharp". Thus, by increasing the temporal resolution used by the bandwidth extension information provider, the audio quality (e.g., of the speech signal) may be substantially improved.
In a preferred embodiment, the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided with an increased temporal resolution at least for a predetermined period of time before the time at which the end segment of the fricative or affricate is detected and for a predetermined period of time after the time at which the end segment of the fricative or affricate is detected. Thus, it is possible to encode the entire end segment of a fricative or affricate with an increased time resolution, although the detector is usually only able to detect the center of the end segment of a fricative or affricate, etc.
According to another embodiment of the present invention an audio decoder is created that provides decoded audio information based on encoded audio information. The audio decoder is configured to perform bandwidth extension based on bandwidth extension information provided by the audio encoder such that the bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before a time when the onset of the fricative or affricate is detected and for a predetermined time period after the time when the onset of the fricative or affricate is detected. Thus, the audio decoder is able to reproduce a substantial part of the onset of a fricative or affricate, or even the entire onset of a fricative or affricate, with a high temporal resolution. Thus, the bandwidth extension performed by the audio decoder may be well adapted to the presence of fricatives or affricates, so that changes in the spectral envelope of the high frequency part of the audio content occurring during the onset of the fricatives or affricates may be reproduced with good perceptual quality. Thus, a good auditory sensation is achieved.
In a preferred embodiment, the audio decoder may comprise a detector configured to detect an onset of a fricative or affricate based on the decoded audio information, said onset of the fricative or affricate representing a low frequency part of the audio content, and to decide itself about the adjustment of the temporal resolution for the bandwidth extension. Any of the criteria discussed herein with respect to the audio encoder for detecting the onset of a fricative or affricate may also be applied to the audio decoder (assuming the desired information is available alongside the audio decoder).
Alternatively, however, the audio decoder may be configured to adjust the temporal resolution for the bandwidth extension based on the side information of the encoded audio information.
According to another embodiment of the present invention an audio decoder is created that provides decoded audio information based on encoded audio information. The audio decoder is configured to perform bandwidth extension based on the bandwidth extension information provided by the audio encoder such that the bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before the time when the end segment of the fricative or affricate is detected and for a predetermined time period after the time when the end segment of the fricative or affricate is detected.
This embodiment according to the invention is based on the idea that a good audio quality can be achieved by performing the bandwidth extension with an increased time resolution during the end segment of the fricative or affricate. Furthermore, embodiments are based on the idea that the end segment of a fricative or affricate is typically extended by a certain time period, wherein the time at which the end segment of a fricative or affricate is detected is typically located within said certain time period.
A further embodiment according to the invention creates a system comprising an audio encoder as described above and an audio decoder, wherein the audio decoder is configured to receive encoded audio information provided by the audio encoder and to provide decoded audio information based on the encoded audio information. The audio decoder is configured to perform bandwidth extension based on the bandwidth extension information provided by the audio encoder such that bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before a time at which an onset of a fricative or affricate is detected and for a predetermined time period after the time at which the onset of the fricative or affricate is detected, and/or such that bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before a time at which an end segment of a fricative or affricate is detected and for a predetermined time period after the time at which an end segment of a fricative or affricate is detected.
The system allows encoding and decoding of audio content, wherein a relatively low bit rate is achieved by using bandwidth extension, and wherein a good reproduction of fricatives or affricates is ensured by using an increased temporal resolution in the context of an onset segment of fricatives or affricates and/or in the context of an end segment of fricatives or affricates.
According to another embodiment of the invention a method of providing encoded audio information on the basis of input audio information is created. The method includes providing bandwidth extension information using variable time resolution and detecting an onset of a fricative or affricate. The time resolution for providing the bandwidth extension information is adjusted such that the bandwidth extension information is provided with an increased time resolution at least for a predetermined time period before the time at which the fricative or affricate onset is detected and for a predetermined time period after the time at which the fricative or affricate onset is detected. This approach is based on the same considerations as the audio encoder described above.
According to another embodiment of the invention a method of providing encoded audio information on the basis of input audio information is created. The method includes providing bandwidth extension information using variable time resolution and detecting a fricative or affricate end segment. The temporal resolution used to provide the bandwidth extension information is adjusted such that the bandwidth extension information is provided at an increased temporal resolution in response to detecting the end segment of a fricative or affricate. This approach is based on the same considerations as the audio encoder described above.
According to another embodiment of the invention a method of providing decoded audio information on the basis of encoded audio information is created. The method includes performing bandwidth extension based on bandwidth extension information provided by an audio encoder such that bandwidth extension is performed with increased temporal resolution at least for a predetermined time period before a time at which an onset of a fricative or affricate is detected and for a predetermined time period after the time at which the onset of the fricative or affricate is detected. This approach is based on the same considerations as the audio decoder described above.
According to another embodiment of the invention a method of providing decoded audio information on the basis of encoded audio information is created. The method includes performing bandwidth extension based on bandwidth extension information provided by an audio encoder such that bandwidth extension is performed with increased temporal resolution at least for a predetermined time period before a time at which an end segment of a fricative or affricate is detected and for a predetermined time period after the time at which the end segment of the fricative or affricate is detected. This approach is based on the same considerations as the audio decoder described above.
According to a further embodiment of the invention a computer program for performing one of the above described methods is created.
According to another embodiment of the present invention, an encoded audio signal is created comprising an encoded representation of a low frequency part of audio content and a plurality of sets of bandwidth extension parameters. The bandwidth extension parameter is provided with an increased time resolution at least for a predetermined time period before the time of the onset of the fricative or affricate present in the audio content and for a predetermined time period after the time of the onset of the fricative or affricate present in the audio content.
According to another embodiment of the present invention, an encoded audio signal is created comprising an encoded representation of a low frequency part of audio content and a plurality of sets of bandwidth extension parameters. The bandwidth extension parameter is provided with an increased temporal resolution at least for the part of the audio content where the end segment of the fricative or affricate is present.
The encoded audio signals are based on the same considerations as the audio encoder and the audio decoder described above.
Drawings
Embodiments according to the invention will be described below with reference to the accompanying drawings:
FIG. 1 shows a block schematic diagram of an audio encoder according to an embodiment of the invention;
FIG. 2 shows a spectrogram of an original speech signal in a known bandwidth extension (BWE) frame and detected fricative or affricate boundaries;
FIG. 3 shows a spectral diagram of an original speech signal with a bandwidth extension (BWE) frame according to the present invention;
FIG. 4 shows a spectral plot of encoded speech in a known bandwidth extension (BWE) frame;
FIG. 5 shows a spectral plot of encoded speech in a bandwidth extended (BWE) frame in accordance with the present invention;
fig. 6 shows a schematic representation of time intervals and sub-time intervals for which a set of bandwidth extension information is provided according to an embodiment of the invention;
fig. 7 shows a schematic representation of time intervals and sub-time intervals for which a set of bandwidth extension information is provided according to an embodiment of the invention;
FIG. 8 shows a block schematic diagram of an audio encoder according to another embodiment of the invention;
FIG. 9 shows a block schematic diagram of an audio decoder according to another embodiment of the invention;
FIG. 10 shows a block schematic diagram of an audio decoder according to another embodiment of the invention;
FIG. 11 shows a block schematic diagram of a system for audio encoding and audio decoding according to an embodiment of the invention;
FIG. 12 shows a flow diagram of a method of providing encoded audio information based on input audio information, according to an embodiment of the invention; and
fig. 13 illustrates a flowchart of a method of providing decoded audio information based on input audio information according to an embodiment of the present invention.
Detailed Description
1. Audio encoder according to FIG. 1
Fig. 1 shows a block schematic diagram of an audio encoder according to an embodiment of the invention.
The
The
The
The
The functionality of the
The low
However, the bandwidth
For example, the bandwidth extension information provider may be configured to provide some or all of the parameters described in the "SBR tool" and/or "low latency SBR" sections of the international standard ISO/IEC 14496-3. For example, the bandwidth
However, the
For example, a 2-bit value may be used to encode how many sets of envelope shape parameters are provided by the bandwidth
Preferably, the signaling may be performed as indicated for the "FIXFIX" case, which is described in chapter 4.6.19 "Low delay SBR" of ISO/IEC 14496-3.
In conclusion, the bandwidth
For example, the
Thus, the encoded
Further, it should be noted that any of the other features and functionalities described herein may also be implemented to the
Some additional details regarding the functionality of the
FIG. 2 shows a spectral plot of an original speech signal with a known bandwidth extension frame and detected fricative or affricate boundaries.
The abscissa 210 describes time (in terms of time zone) and the ordinate 212 specifies QMF subbands. Thus, the representation 200 according to fig. 2 represents the distribution of audio signal energy over time over different QMF subbands.
As shown, the magenta vertical dashed lines designate the time boundaries 220a,220b, … of known bandwidth extension frames. In addition, the black vertical dashed lines designate detected fricative or affricate boundaries 230a,230b,230c,230d, …. Detected fricative or affricate boundaries 230a,230b,230c,230d, … may be detected using a tilt-based detector. As shown, time intervals of equal length (which may be considered as bandwidth extension frames or generally as frames) are bounded by boundaries 220a, …,220u of (known) bandwidth extension frames. In other words, in the known concept according to document D1, the bandwidth extension information may be associated with temporally regular time intervals (separated by boundaries of known bandwidth extension frames) of equal time length.
As shown, the detected fricative or affricate boundary may be located somewhere within the time interval defined by two subsequent boundaries of a known bandwidth extension frame.
However, the known bandwidth extension frame scheme shown in fig. 2 does not allow for a particularly good reproduction of the high frequency part of the audio content, as will be described later.
Fig. 3 shows a spectral diagram of an original speech signal with an inventive bandwidth extension frame (where the inventive bandwidth extension frame is indicated by a black vertical solid line). The
However, between
Similarly, in response to detecting an end segment of a fricative or affricate in a frame (or time interval) between
In conclusion, the bandwidth extension information is provided in the
However, the bandwidth extension information provider is configured to provide a single set of bandwidth extension information for a frame (i.e., a time interval having a given length of time) when using the first ("normal") time resolution. For example, a single set of bandwidth extension information is provided for frames between
Referring now to fig. 4 and 5, certain advantages of the
FIG. 4 shows a spectral plot of encoded speech with a known bandwidth extension frame. An
The
In addition, the
The
Thus, as shown, bandwidth extension frames (e.g., the bandwidth extension frame shown in fig. 2) are known to produce a number of artifacts.
Fig. 5 shows a spectral plot of encoded speech with the inventive bandwidth extension frame (compare to the spectral plot of fig. 4). Also, an
Some details regarding providing bandwidth extension information will be explained below with reference to fig. 6 and 7.
Fig. 6 shows a schematic representation of time intervals and sub-time intervals for providing bandwidth extension information.
The time axis is designated 610. As shown, time (represented by time axis 610) is subdivided into
As shown in fig. 6, bandwidth extension information is provided at a "normal" (relatively low) resolution for
Fig. 7 shows another schematic representation of the temporal resolution for providing bandwidth extension information. The time axis is designated 710. As shown, there are
For example, one respective set of bandwidth extension parameters may be provided for each sub-interval of time intervals 720d and 720 e.
It should be noted, however, that the increased temporal resolution is also used for time interval 720d preceding (immediately preceding) time interval 720e, and that the time at which the onset (or the end) of the fricative or affricate is detected is within time interval 720 e. However, as desired, in accordance with the present invention, at least one other time interval (or sub-time interval) preceding the time interval (or sub-time interval) in which the start segment (or end segment) of the fricative or affricate was detected is encoded at an increased time resolution, the
Conversely, if the onset (or the end) of a fricative or affricate is detected only in the second sub-interval of time interval 720e, the audio encoder selects a low temporal resolution for time interval 720d (the situation shown in fig. 6) to provide bandwidth extension information (possibly). Thus, as can be appreciated from fig. 7, a particular "temporal look ahead" is performed because the increased temporal resolution is selected to provide bandwidth extension information even in the event that the frame does not require increased temporal resolution.
Thus, even with a high temporal resolution, the start of the onset of the fricative or affricate is processed, wherein the start of the onset of the fricative or affricate is usually located before the time when the
The summary is as follows: fig. 3, 5, 6 and 7 illustrate operational concepts that can be applied to the
It should be noted that fig. 6 and 7, for example, represent the structure of an encoded audio signal. For example, the encoded audio signal may comprise an encoded representation of a low frequency portion of the audio content. Furthermore, the encoded audio representation may comprise a plurality of sets of bandwidth extension parameters.
For example, one set of bandwidth extension parameters may be provided for each of frames 620 a-620 d and 620 f. Further, for one of the
A similar concept can be appreciated from fig. 7, wherein a set of bandwidth extension parameters are provided for
As a conclusion, the bandwidth extension parameter may be provided with an improved temporal resolution at least for a predetermined time period before the time when the fricative or affricate onset is detected and for a predetermined time period after the time when the fricative or affricate onset is detected. In addition, bandwidth extension parameters may also be provided with increased temporal resolution for portions of the audio content where a fricative or affricate end segment is detected.
2. Audio encoder according to FIG. 8
Fig. 8 shows a block schematic of an audio encoder according to an embodiment of the invention.
The
The
Furthermore, it should be noted that the
In addition, the
It should be noted, however, that the
Furthermore, it should be noted that the concepts according to fig. 6 and 7 may be applied both in response to detecting an onset segment of a fricative or affricate and in response to detecting an end segment of a fricative or affricate, and may thus also be applied to the audio encoder according to fig. 8.
3. Audio decoder according to FIG. 9
Fig. 9 shows a block schematic diagram of an audio decoder according to an embodiment of the invention. The
However, the
It should be noted that the temporal resolution for bandwidth extension may be conveyed using the side information signal included in the
Alternatively, however, the audio decoder may be configured to detect a start segment of a fricative or affricate or an end segment of a fricative or affricate based on the decoded low frequency portion of the audio content that may be provided by the low
Regarding the functionality of the
4. Audio decoder according to FIG. 10
Fig. 10 shows a block schematic diagram of an audio decoder according to another embodiment of the present invention.
The
Furthermore, it should be noted that the explanations provided above with respect to the
5. System according to claim 11
FIG. 11 shows a block schematic of a system according to an embodiment of the invention. The
It should be noted, however, that the
It should be noted that the system may be supplemented with any of the features and functionalities described above with respect to the audio encoder and audio decoder.
6. Method for providing encoded audio information based on input audio information according to fig. 12
Fig. 12 shows a flow chart of a method of providing encoded audio information based on input audio information. The
The
7. Method of providing decoded audio information according to request item 13
Fig. 13 illustrates a flowchart of a method of providing decoded audio information according to an embodiment of the present invention. The
The
The
8. Conclusion
From the above explanation, it is concluded that embodiments according to the present invention are related to speech coding, and in particular to speech coding using bandwidth extension (BWE) techniques. Embodiments in accordance with the present invention aim to enhance the perceptual quality of a decoded signal by detecting fricatives or affricates within the speech signal and adapting the temporal resolution of the bandwidth extension parameter driven post-processing accordingly (e.g. by adapting the temporal resolution used to provide the set of bandwidth extension information). Embodiments according to the present disclosure include detecting a beginning segment and an ending segment of a fricative or affricate signal portion of a speech signal and providing temporally fine-grained bandwidth extension post-processing during the entire beginning segment and ending segment of the fricative or affricate signal portion (where bandwidth extension processing may, for example, include providing the bandwidth extension information at the side of an audio encoder, and may include performing bandwidth extension at the side of an audio decoder). Thereby, the chance of pre-echo and post-echo artifacts is reduced and a sufficiently gentle model of the start and end segments of the fricative or affricate signal portions can be established with fine-grained bandwidth extension parameters. Thereby, the poor hearing sharpness of fricatives or affricates and the occurrence of annoying pre-and post-echoes in the encoded signal are avoided.
Embodiments according to the invention are advantageous over known solutions. For example, [1] proposes to align the start time instant of the bandwidth extension parameter frame with the time point of the spectral tilt change. The spectral tilt change may represent a start segment or a burst end segment of a fricative or affricate signal portion. [1] The alignment technique proposed in (1) prevents the occurrence of pre-echoes of fricatives or affricates within the bandwidth extension method. However, only the fricative or affricate onset segment is detected and the end segment is missed. Additionally, the above-mentioned techniques do not allow for a fine grained modeling of the spectral temporal characteristics of the onset and end segments of individual fricatives or affricates. Therefore, the sounds of the beginning and ending segments of these fricatives or affricates may be harsh and relatively sharp.
Certain embodiments and aspects in accordance with the present disclosure are described below.
For example, the bandwidth extension encoder of the present invention includes a fricative or affricate detector and a bandwidth extension spectrum time resolution switch.
The fricative or affricate detector is preferably capable of detecting the fricative or affricate onset and termination segments. Suitable low computational complexity implementations of such detectors may be based, for example, on Zero Crossing Rate (ZCR) and energy ratio evaluations (see, e.g., references [2] and [3 ]). The detector may additionally be connected to a speech/music discriminator in order to limit the subsequent inventive processing to speech signals only.
In some embodiments, a specific temporal look ahead of the detector is desirable or even required, enabling the bandwidth extension resolution to be switched in time such that fine grain temporal resolution is used in the bandwidth extension parameter estimation/synthesis during the entire start and end segment signal portion lengths. The duration of the signal portion of the start or end segment may be adaptively measured or assumed to be fixed at an empirical decision value. For example, the number of time intervals or sub-time intervals processed at high temporal resolution in response to detecting a fricative or affricate onset segment or fricative or affricate end segment may be predetermined or adjusted depending on signal characteristics. For example, a detected fricative or affricate may enable a four times higher temporal resolution during a group of several consecutive signal frames (e.g., two or three frames) that completely covers the detected fricative or affricate start or stop segment. Preferably, but not necessarily, the group of high temporal resolution signal frames is approximately centered around the detected fricative or affricate onset or end segment, covering the entire duration of the onset or end segment. In the case of an instantaneous adaptive bandwidth extension frame, a higher temporal resolution substitute instantaneous adaptive frame is initiated during the entire group of signal frames triggered by fricative or affricate detection.
Some details regarding the figures will be discussed below.
FIG. 2 shows a spectral plot of an original speech signal, with magenta vertical dashed bars depicting known bandwidth extension frames. The black dashed lines indicate fricative or affricate boundaries.
Fig. 3 shows a spectral diagram of an original speech signal with the inventive bandwidth extension of frames adapted to fricative or affricate boundaries indicated by solid black vertical lines. At the point in time when a fricative or affricate boundary (start or stop segment) has been detected, the resolution of the bandwidth extension post-processing is refined by switching to four times higher resolution during a group of three consecutive frames.
Fig. 4 depicts the resulting spectral plot of the same speech signal encoded using a known bandwidth extension frame. The yellow ellipse indicates the artifact caused by the known bandwidth extension frame (from left to right): a: pre-echo and strong initial segment; b: post-echo and strong termination segments; c: energy leakage from the previous vowel to the modeled fricative or affricate due to too coarse frames.
Fig. 5 depicts the resulting spectral plot of the same speech signal encoded using the bandwidth extended frame of the present invention. The problem area indicated in fig. 4 is substantially improved.
In conclusion, the spectrogram discussed herein indicates that audio quality can be substantially improved by applying the concept according to the present invention.
Further concluding, an audio encoder, or an audio encoding method, or an associated computer program is created according to embodiments of the invention, as described above.
Further embodiments according to the invention create an audio decoder, or an audio decoding method, or an associated computer program, as described above.
Furthermore, an encoded audio signal or a storage medium having an encoded audio signal stored thereon is created according to an embodiment of the present invention, as described above.
9. Implementation alternatives
Although some aspects have been described in the context of a device, it should be clear that these aspects also represent a description of the corresponding method, wherein a block or means corresponds to a method step or a feature of a method step. Similarly, aspects described in the context of a method step also represent features of the corresponding block or item or the corresponding device. Some or all of the method steps may be performed by (or using) a hardware device, such as a microprocessor, programmable computer or electronic circuitry. In some embodiments, some or more of the most important method steps may be performed by such an apparatus.
The encoded audio signals of the present invention may be stored on a digital storage medium or may be transmitted over a transmission medium, such as a wireless transmission medium or a wired transmission medium, such as the internet.
Embodiments of the invention may be implemented in hardware or software, depending on the particular implementation requirements. Implementations may be performed using digital storage media, such as floppy, DVD, Blu-ray, CD, ROM, PROM, EPROM, EEPROM, or FLASH memory having electronically readable control signals stored thereon which cooperate (or are capable of cooperating) with a programmable computer system such that a respective method is performed. Accordingly, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier with electronically readable control signals capable of cooperating with a programmable computer system in order to perform one of the methods described herein.
In general, embodiments of the invention can be implemented as a computer program product with program code operable to perform one of the above-described methods when the computer program product is run on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments include a computer program stored on a machine-readable carrier for performing one of the methods described herein.
In other words, an embodiment of the inventive methods is therefore a computer program with a program code for performing one of the methods described herein, when the computer program runs on a computer.
Thus, another embodiment of the inventive method is a data carrier (or digital storage medium or computer readable medium) comprising a computer program recorded thereon for performing one of the methods described herein. The data carrier, digital storage medium or recording medium is typically tangible and/or non-volatile.
Thus, another embodiment of the inventive method is a data stream or signal sequence representing a computer program for performing one of the methods described herein. The data stream or signal sequence may for example be arranged to be communicated via a communication connection, for example via the internet.
Another embodiment comprises a processing means, such as a computer or programmable logic device, configured or adapted to perform one of the methods described herein.
Another embodiment comprises a computer having a computer program installed thereon for performing one of the methods described herein.
Another embodiment according to the present invention comprises an apparatus or system configured to transfer (e.g., electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may be, for example, a computer, a mobile device, a memory device, or the like. The device or system may for example comprise a file server for delivering the computer program to the receiver.
In certain embodiments, a programmable logic device (e.g., a field programmable gate array) may be used to perform some or all of the functionality of the methods described herein. In certain embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. In general, the method is preferably performed by any hardware device.
The devices described herein may be implemented using hardware devices or using a computer or using a combination of hardware devices and a computer.
The methods described herein may be performed using a hardware device or using a computer or using a combination of a hardware device and a computer.
The embodiments described above are merely illustrative of the principles of the invention. It is understood that modifications and variations of the arrangements and details described herein will be apparent to others skilled in the art. Therefore, it is intended that the invention be limited only to the extent required by the pending patent application and not to the particular details shown by way of illustration and description of the embodiments herein.
1. An audio encoder (100) for providing an encoded audio information (112) based on an input audio information (112), the audio encoder comprising:
a bandwidth extension information provider (130) configured to provide bandwidth extension information (132) using a variable time resolution;
a detector (120) configured to detect an onset of a fricative or affricate;
wherein the audio encoder is configured to adjust a temporal resolution used by the bandwidth extension information provider such that bandwidth extension information is provided with an increased temporal resolution at least for a predetermined time period (630a) before a time (tf) at which an onset of a fricative or affricate is detected and for a predetermined time period (630c) after the time at which the onset of the fricative or affricate is detected.
2. The audio encoder (100) according to embodiment 1, wherein the audio encoder is configured to switch from a first temporal resolution for the providing the bandwidth extension information to a second temporal resolution for the providing the bandwidth extension information in response to the detection of the onset of a fricative or affricate,
wherein the second temporal resolution is higher than the first temporal resolution.
3. The audio encoder (100) according to embodiment 1 or 2, wherein the bandwidth extension information provider is configured to provide the bandwidth extension information such that the bandwidth extension information is associated with temporally regular time intervals (620a, 620b, 620c, 620d, 620e, 620 f; 720a-720f) of equal time length,
wherein the bandwidth extension information provider is configured to provide a single set of bandwidth extension information for a time interval (620a, 620b, 620c, 620d, 620 f; 720a, 720b, 720c, 720f) of a given length of time if a first temporal resolution is used, and
wherein the bandwidth extension information provider is configured to provide a plurality of sets of bandwidth extension information associated with sub-time intervals (630a, 630b, 630c, 630d) for a time interval (620 e; 720d, 720e) of the given length of time, if a second time resolution is used.
4. The audio encoder (100) according to embodiment 3, wherein the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that at least one sub-time interval (630 a; 730d) associated with one set of bandwidth extension information immediately precedes another sub-time interval (630 b; 730e) associated with another set of bandwidth extension information and during which (630 b; 730e) a start segment of a fricative or affricate is detected,
such that the increased temporal resolution is used in at least one sub-time interval (630 a; 730d) preceding the sub-time interval (630 b; 730e) in which the onset of a fricative or affricate is detected.
5. The audio encoder (100) according to embodiment 3 or 4, wherein the audio encoder is configured to subdivide a given time interval (620 e; 720d, 720e) having the given time length into four sub-time intervals (630a-630 d; 730a-730h) having equal lengths if the bandwidth extension information is provided with an increased temporal resolution for the given time interval (620 e; 720d, 720e) having the given time length,
such that four sets of bandwidth extension information are provided for the given time interval having the given length of time.
6. The audio encoder (100) of any of embodiments 1 to 5,
wherein the audio encoder is configured to selectively provide bandwidth extension information using an increased temporal resolution for a first time interval (720d) of a given length of time preceding a second time interval (720e) of the given length of time,
if a fricative or affricate onset is detected within the second time interval (720e) and if a time of the fricative or affricate onset is detected, a time distance between a time of the fricative or affricate onset and a boundary between the first time interval (720d) and the second time interval (720e) is less than a predetermined time distance.
7. The audio encoder (100) of any of embodiments 1 to 6,
wherein the audio encoder is configured to perform time-look-ahead such that, in response to detecting a start segment of a fricative or affricate within the second time interval (720e), bandwidth extension information is provided with an increased temporal resolution for a first time interval (720d) of a given time length preceding the second time interval (720e) of the given time length.
8. The audio encoder (100) of any of embodiments 1 to 7,
wherein the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that bandwidth extension information is provided with the same increased temporal resolution at least for a predetermined time period (630 a; 730d) preceding a time (tf) at which a start segment of a fricative or affricate is detected and for a predetermined time period (630 c; 730f) following the time at which the start segment of the fricative or affricate is detected.
9. The audio encoder (100) of any of embodiments 1 to 8,
wherein the audio encoder is configured to adjust the temporal resolution used by the bandwidth extension information provider such that a set of bandwidth extension information is provided with the same increased temporal resolution for at least a first sub-time interval (630 a; 730d), a second sub-time interval (630 b; 730e) and a third sub-time interval (630 c; 730f),
wherein the first sub-time interval immediately precedes the second sub-time interval;
wherein, the initial segment of fricative sound or affricate sound is detected in the second sub-time interval; and
wherein the third sub-interval immediately follows the second sub-interval.
10. The audio encoder (100) of any of embodiments 1 to 9,
wherein the detector is configured to detect a fricative or affricate termination segment; and
wherein the audio encoder is configured to adjust a temporal resolution used by the bandwidth extension information provider such that bandwidth extension information is provided with an increased temporal resolution at least for a predetermined time period before a time at which an end segment of a fricative or affricate is detected and for a predetermined time period after the time at which the end segment of the fricative or affricate is detected.
11. The audio encoder (100) according to any of embodiments 1 to 10, wherein the detector is configured to evaluate a zero crossing rate, and/or an energy ratio, and/or a spectral tilt, in order to detect an onset of a fricative or affricate.
12. The audio encoder (100) according to any of embodiments 1 to 11, wherein the detector is configured to evaluate a zero crossing rate, and/or an energy ratio, and/or a spectral tilt, in order to detect a terminating segment of a fricative or affricate.
13. The audio encoder (100) of any of embodiments 1 to 12, wherein the audio encoder is configured to selectively adjust a temporal resolution used by the bandwidth extension information provider such that the bandwidth extension information is provided with an increased temporal resolution in response to detecting an onset of a fricative or affricate only for speech signal portions and not for music signal portions.
14. The audio encoder (100) according to any of embodiments 1 to 13, wherein the audio encoder is configured to selectively provide the bandwidth extension information with an increased temporal resolution for a plurality of subsequent time intervals covering a time at which the onset of the fricative or affricate is detected, in response to detecting the onset of the fricative or affricate or in response to detecting the end of the fricative or affricate.
15. The audio encoder (100) according to embodiment 14, wherein the audio encoder is configured to selectively provide the bandwidth extension information with an increased temporal resolution for a plurality of subsequent time intervals that completely cover an onset of the detected fricative or affricate.
16. An audio encoder (800) for providing encoded audio information (812) based on input audio information (810), the audio encoder comprising:
a bandwidth extension information provider (830) configured to provide bandwidth extension information (832) using a variable time resolution;
a detector (820) configured to detect an end segment of a fricative or affricate;
wherein the audio encoder is configured to adjust a temporal resolution used by the bandwidth extension information provider such that bandwidth extension information is provided at an increased temporal resolution in response to detecting an end segment of a fricative or affricate.
17. The audio encoder (800) of embodiment 16,
wherein the audio encoder is configured to adjust a temporal resolution used by the bandwidth extension information provider such that bandwidth extension information is provided with an increased temporal resolution at least for a predetermined time period before a time at which an end segment of a fricative or affricate is detected and for a predetermined time period after the time at which the end segment of the fricative or affricate is detected.
18. An audio decoder (900) for providing a decoded audio information (912) on the basis of an encoded audio information (910),
wherein the audio decoder (900) is configured to perform bandwidth extension based on bandwidth extension information (932) provided by an audio encoder,
such that the bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before a time at which an onset of a fricative or affricate is detected and for a predetermined time period after the time at which the onset of the fricative or affricate is detected.
19. An audio decoder (1000) for providing a decoded audio information (1012) on the basis of an encoded audio information (1010),
wherein the audio decoder is configured to perform bandwidth extension (1030) based on bandwidth extension information (1032) provided by the audio encoder,
such that the bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before a time at which an end segment of a fricative or affricate is detected and for a predetermined time period after the time at which the end segment of the fricative or affricate is detected.
20. A system (1100), comprising:
an audio encoder (1120) as requesting one of items 1 to 17; and
an audio decoder (1140) configured to receive the encoded audio information (1130) provided by the audio encoder and to provide decoded audio information (1150) based on the encoded audio information,
wherein the audio decoder is configured to perform bandwidth extension based on the bandwidth extension information provided by the audio encoder,
such that said bandwidth extension is performed with an increased time resolution at least for a predetermined time period before a time at which an onset of a fricative or affricate is detected and for a predetermined time period after said time at which said onset of said fricative or affricate is detected, or
Such that the bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before a time at which an end segment of a fricative or affricate is detected and for a predetermined time period after the time at which the end segment of the fricative or affricate is detected.
21. A method (1200) of providing encoded audio information based on input audio information, the method comprising:
providing (1220) bandwidth extension information using variable time resolution; and
detecting (1210) an initial segment of a fricative or affricate;
wherein a temporal resolution for providing the bandwidth extension information is adjusted such that bandwidth extension information is provided with an increased temporal resolution at least for a predetermined time period before a time at which an onset of a fricative or affricate is detected and for a predetermined time period after the time at which the onset of the fricative or affricate is detected.
22. A method (1200) of providing encoded audio information based on input audio information, the method comprising:
providing (1220) bandwidth extension information using variable time resolution; and
detecting (1210) a termination segment of a fricative or affricate;
wherein the temporal resolution for providing the bandwidth extension information is adjusted such that in response to detecting a fricative or affricate end segment, bandwidth extension information is provided at an increased temporal resolution.
23. A method (1300) of providing decoded audio information based on encoded audio information,
wherein the method comprises performing (1320) bandwidth extension based on bandwidth extension information provided by an audio encoder,
such that the bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before a time at which an onset of a fricative or affricate is detected and for a predetermined time period after the time at which the onset of the fricative or affricate is detected.
24. A method (1300) of providing decoded audio information based on encoded audio information,
wherein the method comprises performing (1320) bandwidth extension based on bandwidth extension information provided by an audio encoder,
such that the bandwidth extension is performed with an increased temporal resolution at least for a predetermined time period before a time at which an end segment of a fricative or affricate is detected and for a predetermined time period after the time at which the end segment of the fricative or affricate is detected.
25. A computer program for performing the method according to one of embodiments 21 to 24 when the computer program runs on a computer.
26. An encoded audio signal comprising:
an encoded representation of a low frequency portion of the audio content; and
a plurality of sets of bandwidth extension parameters;
wherein the bandwidth extension parameter is provided with an increased temporal resolution at least for a predetermined time period before a time of an onset of a fricative or affricate present in the audio content and for a predetermined time period after the time of the onset of the fricative or affricate present in the audio content.
27. An encoded audio signal comprising:
an encoded representation of a low frequency portion of the audio content; and
a plurality of sets of bandwidth extension parameters;
wherein the bandwidth extension parameter is provided with an increased temporal resolution in a time portion of an end segment in which a fricative or affricate is present in the audio content.
Reference documents:
[1] U.S. Pat. No. US 20110099018, "apparatus and method for calculating bandwidth extension data using spectrum tilt controlled frames"
[2] Ruinsky and n.dadush and y.lavner, "systems based on spectral and textural features for automatic detection of fricatives and affricates", IEEE 26 th institute of electrical and electronic engineers (IEEE i), israel, p 771-775, 2010.
[3] Fujihara and m.goto, "three techniques for improving the automatic synchronization between music and lyrics: fricative detection, fill models, and new feature vectors for vocal cord activity detection ", IEEE international congress for audio, speech, and signal processing, chicago, usa, 2008.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:基于多种特征融合的语音篡改检测方法