Decoder device for decoding a bitstream to generate an audio output signal from the bitstream
阅读说明:本技术 解码比特流以从该比特流产生音频输出信号的解码器设备 (Decoder device for decoding a bitstream to generate an audio output signal from the bitstream ) 是由 罗伯特·布莱特 于 2014-01-27 设计创作,主要内容包括:提供一种用于解码比特流以从比特流产生音频输出信号的解码器设备,该比特流包含音频数据且选择地包含含有参考响度值的响度元数据,解码器设备包含:音频解码器设备,从音频数据重构音频信号;以及信号处理器,基于音频信号产生音频输出信号;其中信号处理器包含为调整该音频输出信号的水平的增益控制设备;其中增益控制设备包含产生响度值的参考响度解码器,其中在参考响度值存在于该比特流中的情况下,响度值是参考响度值;其中增益控制设备包含基于该响度值且基于音量控制值计算增益值的增益计算器,音量控制值是由允许使用者控制该音量控制值的外部使用者接口提供;其中增益控制设备包含基于增益值控制音频输出信号的响度的响度处理器。(There is provided a decoder apparatus for decoding a bitstream to produce an audio output signal from the bitstream, the bitstream containing audio data and optionally loudness metadata including a reference loudness value, the decoder apparatus comprising: an audio decoder device that reconstructs an audio signal from the audio data; and a signal processor for generating an audio output signal based on the audio signal; wherein the signal processor comprises gain control means for adjusting the level of the audio output signal; wherein the gain control device comprises a reference loudness decoder that generates a loudness value, wherein the loudness value is the reference loudness value in case the reference loudness value is present in the bitstream; wherein the gain control device comprises a gain calculator for calculating a gain value based on the loudness value and based on a volume control value provided by an external user interface allowing a user to control the volume control value; wherein the gain control device comprises a loudness processor for controlling the loudness of the audio output signal based on the gain value.)
1. A decoder device for decoding a bitstream (1) to generate an audio output signal (42) from the bitstream, the bitstream (1) comprising audio data (2) and optionally loudness metadata (3) comprising a reference loudness value (4), the decoder device (41) comprising:
an audio decoder device (9) configured to reconstruct an audio signal (8) from the audio data (2); and
a signal processor (27) configured to generate the audio output signal (42) based on the audio signal (8),
wherein the signal processor (27) comprises a gain control device (10, 15, 28) configured to adjust a loudness level of the audio output signal (42),
wherein the gain control device (10, 15, 28) comprises a reference loudness decoder (10) configured to generate a loudness value (37), wherein the loudness value (37) is the reference loudness value (4) in case the reference loudness value (4) is present in the bitstream (1),
wherein the gain control device (10, 15, 28) comprises a gain calculator (28) configured to calculate a gain value (33) based on the loudness value (37) and based on a volume control value (20) provided by a user interface allowing a user to control the volume control value (20),
wherein the gain control device (10, 15, 28) comprises a loudness processor (15) configured to control the loudness level of the audio output signal (42) based on the gain value (33).
2. Decoder device according to the preceding claim, wherein the loudness value (33) is a preset loudness value in case the reference loudness value (4) is not present in the bitstream (1).
3. Decoder device according to the preceding claim, wherein the preset loudness value is set to a value between-4 dB and-10 dB, in particular between-6 dB and-8 dB, which value is referred to as full-scale amplitude.
4. Decoder device according to one of the preceding claims, wherein the signal processor (27) comprises a dynamic range control device (12, 13, 14) configured to adjust the dynamic range of the audio output signal (42),
wherein the dynamic range control device (12, 13, 14) comprises a dynamic range control switch (12) configured to derive at least one dynamic range control value (6, 7) from the loudness metadata (3) and to output alternatively one of the derived dynamic range control values (6, 7) or a preset dynamic range control value (43),
wherein the dynamic range control device (12, 13, 14) comprises a dynamic range calculator (14) configured to calculate a dynamic range value (44) based on the dynamic range control value (6, 7, 43) output by the dynamic range control switch (12) and based on a compression control value (25), the compression control value (25) being provided by a user interface allowing a user to control the compression control value,
wherein the dynamic range control device (12, 13, 14) comprises a dynamic range processor (13) configured to control the dynamic range of the audio output signal (42) based on the dynamic range value (44).
5. Decoder device according to one of the preceding claims, wherein the signal processor (27) comprises a limiter device (30) configured to limit the amplitude of the audio output signal (42), wherein the limiter device (30) comprises a limiter component (62) with a limiter (51) and a control component (63) configured to control the limiter component (62), wherein a processed audio signal (35) is input to the limiter component (62), which processed audio signal is derived from the audio signal (8) by being processed by at least the gain control device (10, 15, 28), and wherein the audio output signal (42) is output from the limiter component (62).
6. Decoder device according to the preceding claim, wherein the control component (63) is configured to control the limiter component (62) in dependence on the bit rate of the bitstream (1).
7. Decoder device according to claim 5 or 6, wherein the control component (63) is configured to control the limiter component (62) in dependence of a compression efficiency of the audio decoder device (9).
8. Decoder device according to one of the claims 5 to 7, wherein the control component (63) is configured to control the limiter component (62) in dependence of a true peak value (36) which is transmitted in the loudness metadata (3) of the bitstream (1) and which indicates a maximum peak level of an audio source converted by an external encoder into the bitstream (1).
9. Decoder device according to one of the claims 5 to 8, wherein the control component (63) is configured to control the limiter component (62) in dependence of the gain value (33) of the gain control device (10, 15, 28).
10. Decoder device according to one of claims 5 to 9, wherein the control component (63) is configured to control the limiter component (62) in accordance with a volume limit (57) set by the user or manufacturer to prevent hearing impairment.
Technical Field
The present invention relates to the control of loudness of audio, video and multimedia content played in digital form on electronic reproduction devices, and in particular, but not exclusively, to the control of playback loudness that commonly occurs on new media devices, where the content is made with and without embedded loudness metadata.
Background
In generating and transmitting music, video and other multimedia content, loudness normalization processes are performed between different songs or between different programs to ensure that consumers hear audio signals with appropriate loudness. Since early recording and movies, this operation was done during the production process or via the reproduction standard for theaters. It is common practice today in the music and radio broadcasting industry to adjust loudness to a value close to the maximum peak level of the media, while it is common practice in the film and television industry to use one of several standard loudness levels 20dB to 31dB below the maximum peak level. In the era before media convergence (media convergence), consumers did not notice the above situation because each type of content was played using a separate device or volume setting.
With the advent of mobile devices, such as mobile phones or portable media players, for playing both music and movie content, this difference in production practice results in loudness differences that can be as high as 30dB if unmodified content is transmitted to the device. This situation may result in the volume of the movie being too small or the volume of the music being too large when switching from one type of content to another.
The related trend is to increase the loudness of many types of recorded music during mastering (mastering) of the recordings through the use of strong dynamic range compression, limiting and clipping (clipping). Such mastering is performed in consideration of only lossless recording media such as optical discs, but most of music sold today is in lossy data compression formats such as MPEG AAC and MP 3. The data compression process may introduce variations in the time domain waveform reconstructed in the decoder during playback that cause overshoots (overshots) in the waveform that exceed the full scale limit or maximum peak of the signal. In fixed point decoders (or saturated floating point decoders) commonly used in mobile devices, this situation can result in clipping the overshoot to the full scale limit, causing additional audible clipping in the reproduced signal.
In some cases, strong compression and clipping of music is done for artistic purposes, but more commonly for the following purposes: increasing the commercial appeal of a recording by making it "sound louder" than others, or in order to provide understandable content in all listening environments, such as in airports or noisy locations, as well as quiet environments.
Within the movie and video industries, a wide audio dynamic range is used in some genres to achieve huge effects and create a more attractive experience. When delivered to consumers via dolby digital or MPEG-4AAC encoding, audio dynamic range control metadata is typically included in order to allow the dynamic range to be selectively reduced at the receiver or player in the presence of noisy environments or in the event that loud scenes are too disturbing.
The legacy metadata included in DVD or BluRay content encoded by dolby bits or transmitted in TV signals encoded by dolby bits (standardized in Audio compression Standard A/52 of advanced television systems Committee) or MPEG-4AAC (standardized in ISO/IEC14496-3 and ETSI TS 101154) includes the following components:
1. a single static metadata value, which indicates the overall long-term integrated loudness of a program, referred to in the MPEG standard as the program reference level.
2. Static metadata values of downmix gain, which are used to control the downmix of multi-channel content for output via a stereo or mono device.
3. Two sets of dynamic range control gains or scaling factors are sent in the audio signal for each data compressed bitstream frame for multiple frequency bands or regions. In industry terminology, one set is for "light" compression and another set is for "heavy" compression. The use of the mild and severe DRC values is typically related to operation at decoder loudness target levels established for the operational modes "line mode" and "RF mode". The naming convention and operating point for these modes was established at the inception of digital media where it may be necessary to convert digital audio to analog signals that are sent over baseband cables to line inputs on subsequent equipment or transmitted via RF carriers to analog television sets.
The use of this metadata allows the reproduction to be adapted to the listening environment in a non-destructive manner during playback. The same stream or file may be played with a different set of metadata or without metadata at all to produce a different dynamic range. Unlike using a compressor that exists only in the playback device, dynamic range control using metadata allows the creative artist to monitor and control the nature of the compression during the production process as necessary.
Unfortunately, the dynamic range control metadata, which is often implemented in lossy codec such as the MPEG AAC or dolby digital family, cannot compress a signal strong enough to match the loudness of contemporary music, because the metadata affects the average power of the signal (possibly in several frequency bands) on an audio compression frame basis, with a common frame period of 20ms to 40 ms. This frame-by-frame gain control is not fast enough to reduce the peak-to-average ratio of the signal to that of highly processed contemporary music.
The approach used by wolter et al to solve this problem is to increase the average loudness in the playback device using an audio limiter followed by a decoder, as described in [5 ]. This will solve the loudness matching problem so that music and movie content have equal loudness, but with several drawbacks. When the consumer plays the content in a quiet environment (possibly using a mobile device connected to a speaker in a quiet room, or using headphones or in-ear headphones with a strong sound insulation effect), the movie content will be compressed as strongly as the music, which is undesirable. The limiter also introduces additional workload on the device CPU or DSP, thereby shortening battery life.
A different approach is described by Camerer et al in [6] which proposes to encode loudness measures such as described in ITU standard bs.1770-2 as metadata in music files and to normalize the playback of each file to a set of target levels set by the volume control of the device. This method relies on previous music loudness normalization systems, such as soundjack (www.apple.com) and ReplayGain (www.replaygain.org), which are optional features of some music players, such as ipods. In these their methods, it is advocated that the loudness normalization is required to be preset to on; however, there is no provision for what happens when the user turns off the loudness normalization, or more importantly, when content that is not encoded with loudness metadata is played. It is assumed that all content will be analyzed by the playback device or by a secure trusted distributor (such as iTunes) before playback. In addition, no provision is made regarding adjusting the overall dynamic range of the content to adapt it to the listening environment.
It is therefore an object of the present invention to provide a unified approach to the problem of normalizing the playback loudness of the following two categories: movie/video-like content, which may have a wide dynamic range and possibly embedded loudness metadata; and music or radio/podcast content, which may have very narrow dynamic range and strong compression, limitation and clipping, may contain but is likely to not contain embedded loudness metadata, since consumers already own or exchange large amounts of previous music content.
It is another object of the present invention to allow the dynamic range of content containing dynamic range control metadata to be adjusted to the listening environment or taste of the consumer.
It is another object of the invention to prevent possible clipping in lossy data compression audio decoders, such as AAC, MP3 or dolby digital decoders, caused by variations in the signal components introduced by the data compression process.
It is another object of the present invention to provide a slight incentive to the music recording industry to forego the pursuit of stronger dynamic range compression, limitation and clipping in its content.
It is yet another object of the present invention to limit the extra workload on the device CPU or DSP caused by loudness processing or clipping prevention.
Disclosure of Invention
One embodiment of the present invention includes a decoder apparatus for decoding a bitstream to produce an audio output signal from the bitstream, the bitstream containing audio data and optionally loudness metadata including a reference loudness value, the decoder apparatus comprising:
an audio decoder device configured to reconstruct an audio signal from the audio data; and
a signal processor configured to generate the audio output signal based on the audio signal;
wherein the signal processor comprises a gain control device configured to adjust a level of the audio output signal;
wherein the gain control device comprises a reference loudness decoder configured to generate a loudness value, wherein the loudness value is the reference loudness value if the reference loudness value is present in the bitstream;
wherein the gain control apparatus comprises a gain calculator configured to calculate a gain value based on the loudness value and based on a volume control value provided by a user interface that allows a user to control the volume control value;
wherein the gain control device comprises a loudness processor configured to control the loudness of the audio output signal based on the gain value.
The audio decoder device may be any device capable of reconstructing an audio signal from audio data of a compressed bitstream. The signal processor may be any device capable of generating an audio output signal when an audio signal from an audio decoder device is set thereto and having a gain control device as set forth below. A gain control device is a device arranged to control the loudness of an audio output signal.
The reference loudness decoder is configured to decode loudness metadata contained in the bitstream. If the loudness metadata contains a reference loudness value, the reference loudness decoder outputs the reference loudness value as the loudness value.
The gain calculator is a device for calculating a gain value based on the loudness value output by the reference loudness decoder and a volume control value set by a user of the decoder device. To set the volume control value, any user interface may be used. The gain calculator may in particular be a subtractor.
The loudness processor is capable of controlling the loudness level of the audio output signal based on the gain value provided by the gain calculator. The loudness processor may in particular be a multiplier.
Unlike conventional compression decoder devices (such as dolby digital or AAC decoder devices) used in portable devices or consumer electronic devices, the compression decoder device is operated with a variable gain value or decoder target threshold (corresponding to the decoding level of the full-scale bitstream), which is controlled by the volume control of the user. This allows the decoder device to operate well, typically below the maximum full scale range of the device's digital audio system. This operation avoids the possibility of clipping decoder overshoot and allows normalization of the loudness of cinema-like content without heavy dynamic range compression and limiting to that of music content with heavy compression and limiting without further compression or limiting of the cinema-like content as is typically required. For loudness matching purposes only, the present invention performs this normalization without reducing the dynamic range of the content.
In a preferred embodiment of the invention, the loudness value is a preset loudness value in case the reference loudness value is not present in the bitstream. These features allow high quality playback of bitstreams without loudness metadata.
In a preferred embodiment of the invention, the predetermined loudness value is set to a value between-4 dB and-10 dB, in particular between-6 dB and-8 dB, which is referred to as full-scale amplitude. Experimental studies of contemporary music have shown that the observed upper limit of loudness for music content that tends to be played at full scale is about-7 dB. Thus, the claimed preset loudness value provides an optimized mode for playing a bitstream without loudness metadata.
In a preferred embodiment of the invention, the signal processor comprises a dynamic range control device, the dynamic range control device being configured to adjust the dynamic range of the audio output signal,
wherein the dynamic range control device comprises a dynamic range control switch configured to derive at least one dynamic range control value from the loudness metadata and to output one of the derived dynamic range control values or a preset dynamic range control value alternatively,
wherein the dynamic range control apparatus comprises a dynamic range calculator configured to calculate a dynamic range value based on the dynamic range control value output by the dynamic range control switch and based on a compression control value provided by a user interface that allows a user to control the compression control value;
wherein the dynamic range control device comprises a dynamic range processor configured to control a dynamic range of the audio output signal based on the dynamic range value.
The dynamic range control device includes a dynamic range control switch configured to decode loudness metadata of the bitstream such that at least one dynamic range control value is derivable. The dynamic range control switch is typically configured such that a dynamic range control value for light dynamic range control and another dynamic range control value for heavy dynamic range control can be derived. The dynamic range control switch may alternatively output one of the derived dynamic range control values or a preset dynamic range control value. The dynamic range control switch may be automatically controlled, for example, based on subsequent equipment using the audio output signal, or manually controlled by user action. The preset dynamic range control value may be set to, for example, 0 dB.
The dynamic range control apparatus may include a dynamic range calculator capable of calculating a dynamic range value based on the dynamic range control value output by the dynamic range control switch and based on a compression control value provided by a user interface that allows a user to control the compression control value. The dynamic range calculator may in particular be a multiplier.
Furthermore, a dynamic range processor is foreseen which is able to control the dynamic range of the audio output signal based on the dynamic range value. By these features, the playback of the bitstream can be adapted to the listening environment and/or the taste of the listener.
According to a preferred embodiment of the invention, the signal processor comprises a limiter device configured to limit the amplitude of the output audio signal, wherein the limiter device comprises a limiter component having a limiter to which the processed audio signal is input and a control component configured to control the limiter component, wherein the processed audio signal is derived from the audio signal by processing at least by the gain control device, and wherein the audio output signal is output from the limiter component.
The limiter apparatus provides a limit for decoder overshoot limit prevention purposes, provides volume limits for hearing loss prevention or user preferences, and provides artistic compression to allow reversible generation of content with peak limits when needed due to listening environment or user taste.
According to a preferred embodiment of the invention, the control component is configured to control the limiter component in dependence of a bit rate of the bit stream. The probability of the decoder overshooting clipping increases as the bit rate decreases. Thus, decoder overshoot clipping prevention is enhanced when the limiter component is controlled according to the bit rate of the bitstream.
According to a preferred embodiment of the invention, the control component is configured to control the limiter component in dependence of a compression efficiency of the audio decoder device. The compression efficiency of an audio encoder apparatus that generates a bitstream and the compression efficiency at the same time of an audio decoder apparatus that decodes the bitstream describe how much the data quality is reduced when the original audio data is encoded to generate the bitstream. The more the data quality is degraded, the more likely the decoder will overshoot clipping. Thus, decoder overshoot clipping prevention is enhanced when the limiter component is controlled according to the compression efficiency of the audio decoder device.
According to a preferred embodiment of the invention, the control component is configured to control the limiter component in dependence on a true peak value, which is transmitted in the loudness metadata of the bitstream and indicates a maximum peak level of an audio source converted into the bitstream by the outer encoder. The use of this true peak allows a more accurate value to be calculated for the maximum possible peak level of the audio output signal.
According to a preferred embodiment of the invention, the control component is configured to control the limiter component in dependence of a gain value of the gain control device. The maximum possible peak level of the audio output signal is in this sub-case determined by the gain value of the gain control device. If the value is 0dB, the decoder device operates at its full-scale limit as required by the maximum setting of the volume control value. When the volume control value is decreased, the decoder device will operate such that the full-scale bitstream value only reaches the maximum level set by the gain value of the gain control device.
According to a preferred embodiment of the invention, the control unit is configured to control the limiter unit in dependence of a volume limit, which is set by the user or manufacturer in order to prevent hearing damage. By these features, hearing impairment can be effectively avoided.
According to a preferred embodiment of the invention, the control component is configured to control the limiter component in accordance with art limiter parameters transmitted in loudness metadata of the bitstream and indicating art limiter thresholds, art limiter attack time (attack time) values and/or art limiter release time (release time) values. These features allow the operation of the limiter device to be creatively controlled by the artist or content creator. The dynamic range control values contained in the loudness metadata discussed previously allow the overall dynamic range of the content to be adapted to the listening environment via the use of compression gains that act with typical time constants of 100ms to 3 seconds. In challenging listening environments, compressing an audio signal with such time constants may not produce a signal with sufficient loudness to obtain intelligibility or enjoyment without an undesirably high peak level. The following possibilities also exist: a music creator that traditionally only produces highly compressed "squashed" mixes may need to use the flexibility of the present invention to produce both "squashed" mixes and "un-squashed" mixes with less restriction and compression so that the consumer can hear the "un-squashed" version in quiet environments or when needed.
According to a preferred embodiment of the invention, the control assembly is configured to control the limiter assembly continuously or repeatedly. These features allow for variable control of the limiter assembly over time.
According to a preferred embodiment of the invention, the limiter device is configured to bypass the limiter via a bypass device having a transfer function similar to that of the limiter in terms of gain and delay. By these features, the workload of the signal processor can be significantly reduced.
An embodiment of the invention includes a system comprising a decoder and an encoder, wherein the decoder is designed according to the claims.
One embodiment of the present invention includes a method of decoding a bitstream to produce an audio output signal from the bitstream, the bitstream containing audio data and optionally loudness metadata including a reference loudness value, the method comprising the steps of:
reconstructing an audio signal from the audio data using an audio decoder device; and
generating, using a signal processor, the audio output signal based on the audio signal;
wherein the loudness level of the audio output signal is adjusted using a gain control device comprised by the signal processor;
wherein a loudness value is generated by a reference loudness decoder comprised by the gain control device, wherein the loudness value is the reference loudness value in case the reference loudness value is present in the bitstream;
wherein a gain value is calculated by a gain calculator included in the gain control device based on the loudness value and based on a volume control value provided by a user interface that allows a user to control the volume control value;
wherein the loudness level of the audio output signal is controlled by a loudness processor comprised by the gain control device based on the gain value.
An embodiment of the invention comprises a computer program for performing the method as claimed herein when run on a computer or processor.
Drawings
Preferred embodiments of the present invention are discussed subsequently with reference to the accompanying drawings, in which:
FIG. 1 shows a block diagram of a prior art data compression audio decoder with loudness metadata support, such as specified by ISO/IEC14496-3 and ETSI TS 101154, integrated in a typical mobile phone, tablet computer, or portable media player;
FIG. 2 shows an embodiment of a decoder having a data compression audio decoder device and an optional audio limiter according to the present invention, the decoder being suitable for integration into a typical mobile phone, tablet computer or portable media player;
fig. 3 shows an empirically derived function of bit-stream bitrate for possible extra clipping due to overshoot of the reconstructed signal waveform in an AAC-LC stereo decoder;
FIG. 4 shows a block diagram of a preferred embodiment of any of the limiter devices according to the present invention; and
figure 5 shows a block diagram of a preferred embodiment of any of the limiter devices according to the invention, which operates in an artistic limiting mode.
Detailed Description
As an aid to understanding the operation of the present invention, fig. 1 illustrates the operation of a prior art metadata enabled data compression
The dynamic
The
Importantly, if the reference loudness value 4 is not present in a given
The output of the
Fig. 2 depicts a
an audio decoder device 9 configured to reconstruct an audio signal 8 from the audio data 2; and a
wherein the
wherein the
wherein the
wherein the
The audio decoder device 9 may be any device 9 capable of reconstructing an audio signal 8 from audio data 2 of a
The
The gain calculator 28 is a device for calculating a
The
Unlike
In a preferred embodiment of the invention, the
In a preferred embodiment of the invention, the
In a preferred embodiment of the invention, the
wherein the dynamic
wherein the dynamic
wherein the dynamic
The dynamic
The dynamic
Furthermore, the
Figure 2 shows the operation of a preferred embodiment of the present invention contained in an
In contrast to the operation described previously in fig. 1, the
Importantly, the
If no value 4 is present in a given
The dynamic
Those skilled in the art will appreciate that the
In previous approaches to match the loudness of various types of content (such as in [5 ]), limiters were used in the signal chain after the core audio decoder and after the dynamic range control metadata was applied in order to limit the signal peaks and thus increase the average level of the signal without clipping. In contrast to a "hard" limiter or limiter, which simply achieves mathematical saturation at a critical level, this limiter should operate in the following way: signal peaks are limited in a "soft" manner by changing the signal gain when the signal waveform approaches or exceeds a critical value, thereby avoiding the introduction of audible artifacts into the signal. Such soft limiters are computationally expensive and may account for 10% to 30% of the workload caused by the decoder device.
In contrast, the present invention does not require a limiter for controlling the peak-to-average ratio of the
In view of clipping protection, two sub-cases of the signal have to be considered. Some
According to a preferred embodiment of the present invention as shown in fig. 4 and 5, the
The
The
According to a preferred embodiment of the invention, shown in fig. 4, the
In a preferred embodiment of this optional feature, the bit-
According to a preferred embodiment of the invention, the control component is configured to control the
In a preferred embodiment of this optional feature, the compression efficiency of the audio decoder arrangement 9 is input into a
In the event that the maximum level of the processed core
According to a preferred embodiment of the invention, the
In this sub-case, where
According to a preferred embodiment of the present invention, the
In the case of a bitstream containing
According to a preferred embodiment of the invention, the
In the case where limiting is to avoid hearing impairment, the device user or manufacturer may use the volume limit signal to set a maximum peak level 57 to which the output must be limited. When the
According to a preferred embodiment of the present invention shown in fig. 5, the
In this mode, the
According to a preferred embodiment of the invention, it is not possible to apply a compensation gain (makeup-gain) after the
According to a preferred embodiment of the present invention, the
According to a preferred embodiment of the invention, the
Those skilled in the art will appreciate that the processes may be implemented in software as a series of computer instructions or in hardware components. The operations described herein are typically performed by a computer CPU or digital signal processor as software instructions, and the registers and operations shown in the figures may be implemented by corresponding computer instructions. This, however, does not preclude embodiments using hardware components in equivalent hardware designs. Those skilled in the art will appreciate that the
In the construction of the modified
It should be further appreciated that while the present invention provides particular advantages for controlling clipping produced by decoder overshoot in lossy audio data codecs such as AAC, MP3 or dolby bits, the present invention may also be used in audio systems having lossless audio codecs or having audio signals that are not compressed at all by audio codecs.
The present invention can provide:
1. a system for audio loudness normalization provides an output whose full scale value is intended to correspond to the maximum peak output voltage or sound pressure level of a combining device, where the loudness level or average power of the output is controlled, directly or indirectly, by a user volume control of the device, so that both content with audio loudness metadata and content without audio loudness metadata but normalized to its full scale value are reproduced at nearly the same audio loudness level.
2. A system wherein the long term average power or perceived loudness of content without audio loudness metadata is estimated by a fixed value that is determined by empirical or statistical analysis of the content.
3. A system wherein the estimate is biased to reproduce typical content without metadata at a slightly lower loudness than the same content with metadata properly prepared, thereby providing a stimulus for using the metadata.
4. A system for data compression audio decoding comprising an output peak limiter wherein the need for peak limiting is determined by a calculated function of the target level of the compressed audio decoder and the audio codec compression efficiency or bit rate, the peak limiting being for the purpose of clipping to prevent overshoot of the decoder.
5. A system for data compression audio decoding having an output peak limiter wherein the need for peak limiting is determined by a calculated function of the target level of the compressed audio decoder, the audio codec compression efficiency or bit rate and a metadata value transmitted in the compressed bitstream indicative of the maximum peak level of the audio program, the peak limiting being for the purpose of clipping to prevent overshoot of the decoder.
6. A system for data compression audio decoding having an output peak limiter wherein the need for peak limiting is determined by a target level of the compressed audio decoder, the peak limiting serving the purpose of limiting the maximum peak audio output of the device.
7. A system for data compression audio decoding or audio processing having an output peak limiter wherein the need for peak limiting is determined by the value of a scaling gain applied to the audio signal, the peak limiting being for the purpose of limiting the maximum peak audio output of the device.
8. A system for data compression audio decoding or audio processing having an output peak limiter wherein the need for peak limiting is determined by the value of the scaling gain applied to the audio signal and the value of metadata transmitted in a compressed bitstream indicative of the maximum peak level of the audio program for the purpose of limiting the maximum peak audio output of the device.
9. A system wherein the limiter is replaced with a function having similar gain and delay when no limiting is required.
10. A system for data compression audio decoding or audio processing, comprising an output peak limiter, wherein a peak limiter threshold is controlled by a metadata value transmitted in a compressed bitstream or on a periodic basis.
11. A corresponding method or non-transitory storage for audio loudness normalization provides an output whose full scale value tends to correspond to the maximum peak output voltage or sound pressure level of a combining device, where the loudness level or average power of the output is controlled, directly or indirectly, by a user volume control of the device, so that both content with audio loudness metadata and content without audio loudness metadata but normalized to its full scale value are reproduced at nearly the same audio loudness level.
12. A decoder device for decoding a bitstream (1) to generate an audio output signal (42) from the bitstream, the bitstream (1) comprising audio data (2) and optionally loudness metadata (3) comprising a reference loudness value (4), the decoder device (41) comprising:
an audio decoder device (9) configured to reconstruct an audio signal (8) from the audio data (2); and
a signal processor (27) configured to generate the audio output signal (42) based on the audio signal (8),
wherein the signal processor (27) comprises a gain control device (10, 15, 28) configured to adjust a loudness level of the audio output signal (42),
wherein the gain control device (10, 15, 28) comprises a reference loudness decoder (10) configured to generate a loudness value (37), wherein the loudness value (37) is the reference loudness value (4) in case the reference loudness value (4) is present in the bitstream (1),
wherein the gain control device (10, 15, 28) comprises a gain calculator (28) configured to calculate a gain value (33) based on the loudness value (37) and based on a volume control value (20) provided by a user interface allowing a user to control the volume control value (20),
wherein the gain control device (10, 15, 28) comprises a loudness processor (15) configured to control the loudness level of the audio output signal (42) based on the gain value (33).
13. The decoder device as described above, wherein the loudness value (33) is a preset loudness value in case the reference loudness value (4) is not present in the bitstream (1).
14. The decoder apparatus as described above, wherein the predetermined loudness value is set to a value between-4 dB and-10 dB, in particular between-6 dB and-8 dB, which value is referred to as full-scale amplitude.
15. The decoder apparatus as described above, wherein the signal processor (27) comprises a dynamic range control device (12, 13, 14) configured to adjust the dynamic range of the audio output signal (42),
wherein the dynamic range control device (12, 13, 14) comprises a dynamic range control switch (12) configured to derive at least one dynamic range control value (6, 7) from the loudness metadata (3) and to output alternatively one of the derived dynamic range control values (6, 7) or a preset dynamic range control value (43),
wherein the dynamic range control device (12, 13, 14) comprises a dynamic range calculator (14) configured to calculate a dynamic range value (44) based on the dynamic range control value (6, 7, 43) output by the dynamic range control switch (12) and based on a compression control value (25), the compression control value (25) being provided by a user interface allowing a user to control the compression control value,
wherein the dynamic range control device (12, 13, 14) comprises a dynamic range processor (13) configured to control the dynamic range of the audio output signal (42) based on the dynamic range value (44).
16. The decoder apparatus as described above, wherein the signal processor (27) comprises a limiter device (30) configured to limit the amplitude of the audio output signal (42), wherein the limiter device (30) comprises a limiter component (62) having a limiter (51) and a control component (63) configured to control the limiter component (62), wherein a processed audio signal (35) is input to the limiter component (62), the processed audio signal being derived from the audio signal (8) by processing by at least the gain control device (10, 15, 28), and wherein the audio output signal (42) is output from the limiter component (62).
17. The decoder apparatus as described above, wherein the control component (63) is configured to control the slicer component (62) in dependence of the bit-rate of the bit-stream (1).
18. The decoder apparatus according to claim 16 or 17, wherein the control component (63) is configured to control the limiter component (62) in dependence of a compression efficiency of the audio decoder apparatus (9).
19. The decoder apparatus according to one of the items 16 to 18, wherein the control component (63) is configured to control the limiter component (62) according to a true peak value (36) which is transmitted in the loudness metadata (3) of the bitstream (1) and which indicates a maximum peak level of an audio source converted by an external encoder into the bitstream (1).
20. Decoder device according to one of the items 16 to 19, wherein the control component (63) is configured to control the limiter component (62) in dependence of the gain value (33) of the gain control device (10, 15, 28).
21. The decoder device according to one of the items 16 to 20, wherein the control component (63) is configured to control the limiter component (62) in accordance with a volume limit (57) set by the user or manufacturer to prevent hearing impairment.
22. Decoder device according to one of the items 16 to 21, wherein the control component (63) is configured to control the limiter component (62) in accordance with art limiter parameters (32) transmitted in the loudness metadata (3) of the bitstream (1) and indicating art limiter threshold values (74a), art limiter activation time values (74b) and/or art limiter release time values (74 c).
23. Decoder device according to one of the items 16 to 22, wherein the control component (63) is configured to continuously or repeatedly control the limiter component (62).
24. Decoder device according to one of the items 16 to 23, wherein the limiter device (30) is configured to bypass the limiter (51) via a bypass device (53) having a transfer function similar to the transfer function of the limiter (51) in terms of gain and delay.
25. A system comprising a decoder device (41) and an encoder, wherein the decoder device (41) is designed according to one of
26. A method of decoding a bitstream (1) to generate an audio output signal (42) from the bitstream, the bitstream (1) comprising audio data (2) and optionally loudness metadata (3) comprising a reference loudness value (4), the method comprising the steps of:
reconstructing an audio signal (8) from the audio data (2) using an audio decoder device (9); and
generating the audio output signal (42) based on the audio signal (8) using a signal processor (27),
wherein the loudness level of the audio output signal (42) is adjusted using a gain control device (10, 15, 28) comprised by the signal processor (27),
wherein a loudness value (37) is generated by a reference loudness decoder (10) comprised by the gain control device (10, 15, 28), wherein the loudness value (37) is the reference loudness value (4) in case the reference loudness value (4) is present in the bitstream,
wherein a gain value (33) is calculated by a gain calculator (28) comprised by the gain control device (10, 15, 28) based on the loudness value (37) and based on a volume control value (20), the volume control value (20) being provided by a user interface allowing a user to control the volume control value,
wherein the loudness level of the audio output signal (42) is controlled based on the gain value (33) by a loudness processor (15) comprised by the gain control device (10, 15, 28).
27. A computer program for performing the method of
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Similarly, aspects described in the context of method steps also represent a description of the corresponding block or the corresponding item or feature of the apparatus. Some or all of the method steps may be performed by (or using) hardware means, such as a microprocessor, a programmable computer or electronic circuitry. In some embodiments, one or more of the most important method steps may be performed by this apparatus.
Embodiments of the present invention may be implemented in hardware or software, depending on the particular implementation requirements. Embodiments may be implemented using a non-transitory storage medium, such as a digital storage medium, e.g., a floppy disk, a DVD, a blu-ray disk, a CD, a ROM, a PROM, and EPROM, EEPROM or a flash memory, having electronically readable control signals stored thereon that cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Accordingly, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals capable of cooperating with a programmable computer system to cause one of the methods described herein to be performed.
Generally, embodiments of the invention may be implemented as a computer program product having a program code operable to perform one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments include a computer program for performing one of the methods described herein, stored on a machine-readable carrier.
In other words, an embodiment of the method of the invention is thus a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
Another embodiment of the method of the invention is thus a data carrier (or digital storage medium or computer readable medium) comprising a computer program recorded thereon for performing one of the methods described herein. Data carriers, digital storage media or recording media are usually tangible and/or non-transitory.
Another embodiment of the method of the invention is thus a data stream or a signal sequence representing a computer program for performing one of the methods described herein. The data stream or the signal sequence may for example be arranged to be communicated via a data communication connection, for example via the internet.
Another embodiment comprises a processing means, such as a computer or programmable logic device, configured to perform or adapted to perform one of the methods described herein.
Another embodiment comprises a computer having installed thereon a computer program for performing one of the methods described herein.
Another embodiment according to the invention comprises an apparatus or a system configured to transfer (e.g. electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may be, for example, a computer, a mobile device, a memory device, or the like. The apparatus or system may, for example, comprise a file server for delivering the computer program to the receiver.
In some embodiments, programmable logic devices (e.g., field programmable gate arrays) may be used to perform some or all of the functionality of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. In general, the methods are preferably performed by any hardware means.
The above-described embodiments are merely illustrative of the principles of the present invention. It is to be understood that modifications and variations of the configurations and details described herein will be apparent to those skilled in the art. Therefore, it is intended that the scope of the claims be limited only by the specific details presented herein through the description and illustration of the embodiments.
Description of the symbols
1 bit stream
2 Audio data
3 loudness metadata
4 reference loudness value
5 downmix gain value
6 mild dynamic range control value
7 Severe dynamic Range control value
8 audio signal
9 Audio decoder device
10 reference loudness decoder
11 downmix gain decoder
12 dynamic range control switch
13 dynamic range processor
14 dynamic range calculator
15 loudness processor
16 gain calculator
17 static target level provider
18 audio output signal
19 mixing audio signals
20 volume control value
21 decoder device
22 auxiliary audio signal
23 Audio signal mixer
24 loudness-adjusted auxiliary audio signal
25 compression control value
26 Signal processor
27 Signal processor
28 gain calculator
29 mixing audio signals
30 restrictor device
31 loudness value
32 Art Limit parameters
33 gain value
34 bit rate value
35 processed audio signal
36 true peak
37 loudness value
41 decoder device
42 audio output signal
43 Preset dynamic Range control value
44 dynamic range value
51 limiter
52 limiter switch
53 bypass device
54 slice prediction apparatus
55 comparator
56 clipping prediction function
57 volume limit
58 volume limit switch
59 minimum finder
60 true peak switch
61 combiner
62 limiter assembly
63 control assembly
71 combiner
72 minimum finder
73 dynamic range control switch
74 output data of dynamic range control switch
70a Art Limit threshold
70b Art Limitor Start time value
70c artistic limiter release time value.
Reference to the literature
[1] International Organization for Standardization and International electrotechnical Commission, ISO/IEC14496-3 Information technology-Coding of Audio-visual objects-part 3: Audio, www.iso.org.
[2]European Telecommunications Standards Institute,ETSI TS 101154:Digital Video Broadcasting(DVB);Specification for the use of Video and AudioCoding in Broadcasting Applications based on the MPEG-2transport stream,www.etsi.org.
[3]Advanced Television Systems Committee,Inc.,Audio CompressionStandard A/52,www.atsc.org.
[4]International Telecommunications Union,Recommendation ITU-RBS.1770-3:Algorithms to measure audio programme loudness and true-peak audiolevel,www.itu.int.
[5] Martin Wolters, Harald Mundt, and Jeffrey Riedmiller, "Loodness standardization In The agent Of Portable Media Players", paper 8044, Audio engineering Society 128th Convention, www.aes.org
[6]Florian Camerer,et al,“Loudness Normalization:The Future of File-Based Playback,”Music Loudness Alliance,www.music-loudness.com.
[7]Dolby Laboratories,Inc.,Dolby Digital Professional EncodingGuidelines,www.dolby.com.
[8] Perttu Hamalainen, "smoothening Of The Control Signal Without clipping output In Digital Peak detectors", Proc. Of The 5th International Conference on Digital Audio Effects, 26-28.2002, Germany, Hamburg.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:一种人工混响的生成装置与方法