Method and apparatus for extracting tone-independent timbre attributes from a media signal

文档序号：958600 发布日期：2020-10-30 浏览：16次中文

阅读说明：本技术 从媒体信号提取与音调无关的音色属性的方法和装置 (Method and apparatus for extracting tone-independent timbre attributes from a media signal ) 是由 Z·拉菲于 2019-03-12 设计创作，主要内容包括：公开了从媒体信号中提取与音调无关的音色属性的方法和装置。示例性装置包括用于接收媒体信号的接口；以及音频特征提取器,该音频特征提取器用于确定对应于媒体信号的音频的频谱；并且基于该频谱的变换的幅度的逆变换来确定该音频的与音调无关的音色属性。(Methods and apparatus for extracting tonal-independent tonal attributes from a media signal are disclosed. An example apparatus includes an interface to receive a media signal; and an audio feature extractor for determining a spectrum of audio corresponding to the media signal; and determining a pitch independent timbre property of the audio based on an inverse transform of the transformed amplitude of the spectrum.)

1. An apparatus for extracting tonal independent properties from a media signal, the apparatus comprising:

an interface for receiving a media signal; and

an audio feature extractor to:

determining a frequency spectrum of audio corresponding to the media signal; and is

Determining pitch-independent timbre properties of the audio based on an inverse transform of the transformed magnitudes of the spectrum.

2. The apparatus of claim 1, wherein the media signal is audio.

3. The apparatus of claim 1, wherein the media signal is a video signal having an audio component, the apparatus further comprising an audio extractor for extracting the audio from the video signal.

4. The apparatus of claim 1, wherein the audio feature extractor determines the spectrum of the audio using a constant Q transform.

5. The apparatus of claim 1, wherein the audio feature extractor determines the transform of the frequency spectrum using a fourier transform and determines the inverse transform using an inverse fourier transform.

6. The apparatus of claim 1, wherein the audio feature extractor determines a timbre-independent tonal property of the audio based on an inverse transform of a complex argument of the transform of the spectrum.

7. The apparatus of claim 1, wherein the interface is a first interface, the apparatus further comprising a second interface to:

sending the tone-independent timbre attribute to a processing device; and is

In response to sending the tone-independent timbre attribute to the processing device, receiving at least one of a classification of the audio or an identifier corresponding to the media signal from the processing device.

8. The apparatus of claim 7, wherein the second interface transmits the at least one of the classification of the audio or an identifier corresponding to the media signal to a user interface.

9. The apparatus of claim 1, wherein the interface is a microphone for receiving the media signal via ambient audio.

10. The apparatus of claim 1, wherein the media signal corresponds to a media signal to be output by a media output device.

11. The apparatus of claim 1, wherein the interface receives the media signal from a microphone.

12. A non-transitory computer-readable storage medium comprising instructions that, when executed, cause a machine to at least:

Accessing a media signal;

determining a frequency spectrum of audio corresponding to the media signal; and is

Determining a pitch-independent timbre property of the audio based on an inverse transform of the amplitude of the transform of the spectrum.

13. The computer-readable storage medium of claim 12, wherein the media signal is audio.

14. The computer readable storage medium of claim 12, wherein the media signal is a video signal having an audio component, wherein the instructions, when executed, cause the machine to extract the audio from the video signal.

15. The computer readable storage medium of claim 12, wherein the instructions, when executed, cause the machine to determine the spectrum of the audio using a constant Q transform.

16. The computer readable storage medium of claim 12, wherein the instructions, when executed, cause the machine to determine the transform of the spectrum using a fourier transform and determine the inverse transform using an inverse fourier transform.

17. The computer readable storage medium of claim 12, wherein the instructions, when executed, cause the machine to determine a timbre-independent tonal attribute of the audio based on an inverse transform of a complex argument of the transform of the spectrum.

18. The computer readable storage medium of claim 12, wherein the instructions, when executed, cause the machine to:

sending the tone-independent timbre attribute to a processing device; and is

19. A computer readable storage medium as defined in claim 18, wherein the instructions, when executed, cause the machine to send the at least one of the classification of the audio or an identifier corresponding to the media signal to a user interface.

20. A method of extracting tonal independent properties from a media signal, the method comprising:

determining, by executing instructions with a processor, a frequency spectrum of audio corresponding to the received media signal; and

determining, by executing instructions with the processor, a pitch-independent timbre property of the audio based on an inverse transform of a magnitude of a transform of the spectrum.

Technical Field

The present disclosure relates generally to audio processing and, more particularly, to a method and apparatus for extracting tonal-independent tonal attributes from a media signal.

Background

Timbre (e.g., timbre/timbre attributes) is the quality/characteristic of audio regardless of audio pitch or loudness. The timbre makes two different sounds different from each other even if they have the same pitch and loudness. For example, the guitar and flute play the same note with the same amplitude, and the sound is different because the guitar and flute have different timbres. The timbre corresponds to the frequency and temporal envelope (e.g., the distribution of energy along time and frequency) of the audio event. The audio features corresponding to timbre perception include a spectrum and an envelope.

Drawings

FIG. 1 is an illustration of an exemplary meter for extracting tonal attributes from a media signal that are not related to pitch.

Fig. 2 is a block diagram of the exemplary audio analyzer and the exemplary audio determiner of fig. 1.

Fig. 3 is a flow diagram representing example machine readable instructions that may be executed to implement the example audio analyzer of fig. 1 and 2 to extract tonal independent attributes from a media signal and/or to extract tonal independent tones from a media signal.

Fig. 4 is a flow diagram representing example machine readable instructions that may be executed to implement the example audio determiner of fig. 1 and 2 to characterize audio and/or identify media based on an unvoiced tone color log spectrum.

Fig. 5 illustrates an exemplary audio signal, an exemplary pitch of the audio signal, and an exemplary timbre of the audio signal that may be determined using the exemplary audio analyzer of fig. 1 and 2.

Fig. 6 is a block diagram of a processor platform configured to execute the example machine readable instructions of fig. 3 to control the example audio analyzer of fig. 1 and 2.

Fig. 7 is a block diagram of a processor platform configured to execute the example machine readable instructions of fig. 4 to control the example audio determiner of fig. 1 and 2.

The figures are not drawn to scale. Wherever possible, the same reference numbers will be used throughout the drawings and the accompanying written description to refer to the same or like parts.

Detailed Description

An audio meter is a device that captures audio signals (e.g., directly or indirectly) for processing of the audio signals. For example, when a panelist signs up to have their exposure to media monitored by an audience measurement entity, the audience measurement entity may send a technician to the panelist's home to install a meter (e.g., a media monitor) that can collect media exposure data from a media output device (e.g., a television, radio, computer, etc.). In another example, the meter may correspond to instructions executing on a processor in a smart phone to, for example, process received audio and/or video data to determine characteristics of the media.

Typically, the meter includes or is otherwise connected to an interface to receive media signals directly from a media source or indirectly (e.g., to collect ambient audio from a microphone and/or magnetic coupling device). For example, when the media output device is "on," the microphone may receive acoustic signals transmitted by the media output device. The meter may process the received acoustic signals to determine audio characteristics that may be used to characterize and/or identify the audio or audio source. When the meter corresponds to instructions within and/or operating with a media output device to receive audio and/or video signals to be output by the media output device, the meter may process/analyze the input audio and/or video signals to directly determine data related to the signals. For example, the meter may operate in a set-top box, receiver, mobile phone, etc. to receive and process incoming audio/video data before, during, or after output by the media output device.

In some examples, the audio metering device/instructions utilize various characteristics of the audio to classify and/or identify the audio and/or audio sources. Such features may include energy of the media signal, energy of a band of the media signal, Discrete Cosine Transform (DCT) coefficients of the media signal, and so forth. Examples disclosed herein classify and/or identify media based on timbres of audio corresponding to the media signals.

Timbre (e.g., timbre/timbre attributes) is the quality/characteristic of audio regardless of audio pitch or loudness. For example, the guitar and flute play the same note with the same amplitude, and the sound is different because the guitar and flute have different timbres. The timbre corresponds to the frequency and temporal envelope (e.g., the distribution of energy along time and frequency) of the audio event. Traditionally, timbre has been characterized by various features. However, timbre has not been extracted from audio independently of other aspects of audio (e.g., pitch). Thus, identifying media based on a measure of pitch-related timbre would require a large database of reference pitch-related timbres corresponding to each category and each pitch. Examples disclosed herein extract a tonal-independent log spectrum of timbre from tonal-independent measured audio, thereby reducing the resources required to classify and/or identify media based on timbre.

As described above, the extracted tone-independent timbres may be used to classify and/or identify the media and/or may be used as part of a signature algorithm. For example, the extracted pitch-independent timbre attributes (e.g., log spectrum) may be used to determine that the measured audio (e.g., audio sample) corresponds to a violin, regardless of the note that the violin is playing. In some examples, the characteristic audio may be used to adjust audio settings of the media output device to provide a better audio experience for the user. For example, some audio equalizer settings may be more appropriate for audio from a particular instrument and/or style. Accordingly, examples disclosed herein may adjust audio equalizer settings of a media output device based on the instrument/style corresponding to the extracted timbre identification. In another example, the extracted pitch independent timbres can be used to identify media being output by a media presentation device (e.g., a television, computer, radio, smartphone, tablet, etc.) by comparing the extracted pitch independent timbre attributes to reference timbre attributes in a database. In this manner, the extracted tones and/or tones may be used to provide more detailed media exposure information to the audience measurement entity than conventional techniques that only consider the tones of the received audio.

Fig. 1 shows an exemplary audio analyzer 100 for extracting tonal-independent properties from a media signal. Fig. 1 includes an exemplary audio analyzer 100, an exemplary media output device 102, exemplary speakers 104a, 104b, an exemplary media signal 106, and an exemplary audio determiner 108.

The example audio analyzer 100 of fig. 1 receives a media signal from a device (e.g., the example media output device 102 and/or the example speakers 104a, 104b) and processes the media signal to determine tonal attributes (e.g., a log spectrum) that are not tonal dependent and tonal attributes that are not tonal dependent. In some examples, the audio analyzer 100 may include (or otherwise be connected to) a microphone to receive the example media signal 106 by sensing ambient audio. In such examples, the audio analyzer 100 may be implemented in a meter or other computing device (e.g., a computer, tablet, smartphone, smartwatch, etc.) that utilizes a microphone. In some examples, the analyzer 100 includes an interface for receiving the example media signal 106 directly from the example media output device 102 (e.g., via a wired or wireless connection) and/or a media presentation device that presents media to the media output device 102. For example, the audio analyzer 100 may receive the media signal 106 directly from a set-top box, a mobile phone, a gaming device, an audio receiver, a DVD player, a blu-ray player, a tablet, and/or any other device that provides media to be output by the media output device 102 and/or the example speakers 104a, 104 b. As described further below in conjunction with fig. 2, the example audio analyzer 100 extracts tonal attributes and/or tonal attributes that are not tonal from the media signal 106. If the media signal 106 is a video signal having an audio component, the example audio analyzer 100 extracts the audio component from the media signal 106 prior to extracting the pitch and/or timbre.

The exemplary media output device 102 of fig. 1 is a device that outputs media. Although the example media output device 102 of fig. 1 is shown as a television, the example media output device 102 may be a radio, an MP3 player, a video game console, a stereo system, a mobile device, a tablet, a computing device, a tablet, a laptop, a projector, a DVD player, a set-top box, a set-top device, and/or any device capable of outputting media (e.g., video and/or audio). Exemplary media output devices may include speakers 104a and/or may be coupled or connected to portable speakers 104b via a wired or wireless connection. The example speakers 104a, 104b output the audio portion of the media output by the example media output device. In the example shown in fig. 1, the media signal 106 represents audio output by the example speakers 104a, 104 b. Additionally or alternatively, the example media signal 106 may be an audio signal and/or a video signal that is sent to the example media output device 102 and/or the example speakers 104a, 104b for output by the example media output device 102 and/or the example speakers 104a, 104 b. For example, the example media signal 106 may be a signal from a gaming machine that is sent to the example media output device 102 and/or the example speakers 104a, 104b to output audio and video of a video game. The example audio analyzer 100 may receive the media signal 106 directly from a media presentation device (e.g., a gaming console) and/or from ambient audio. In this manner, the audio analyzer 100 may classify and/or identify audio from the media signal even when the speakers 104a, 104b are off, inoperative, or turned off.

The example audio determiner 108 of fig. 1 characterizes the audio and/or identifies the media based on the tone-independent timbre attributes received from the example audio analyzer 100. For example, the audio determiner 108 may include a database of reference tone-independent timbre attributes corresponding to the classifications and/or identifications. In this manner, the example audio determiner 108 may compare the received pitch independent timbre attributes to the reference pitch independent attributes to identify a match. If the example audio determiner 108 identifies a match, the example audio determiner 108 classifies the audio and/or identifies media on the information that corresponds to the matching reference timbre attribute. For example, if the received timbre attribute matches the reference attribute corresponding to the loudspeaker, the example audio determiner 108 classifies the audio corresponding to the received timbre attribute as audio from the loudspeaker. In such an example, if audio analyzer 100 is part of a mobile phone, exemplary audio analyzer 100 may receive audio signals of a speaker playing a song (e.g., via an interface receiving audio/video signals or via a microphone of the mobile phone receiving the audio signals). In this manner, the audio determiner 108 may identify that the instrument corresponding to the received audio is a trumpet and identify the trumpet to the user (e.g., using a user interface of the mobile device). In another example, if the received timbre attribute matches the reference attribute corresponding to the particular video game, the example audio determiner 108 may identify the audio corresponding to the received timbre attribute as being from the particular video game. The example audio determiner 108 may generate a report to identify the audio. In this manner, the audience measurement entity may trust exposure to the video game based on the report. In some examples, the audio determiner 108 receives the timbre directly from the audio analyzer 100 (e.g., both the audio analyzer 100 and the audio determiner 108 are located in the same device). In some examples, the audio determiner 108 is located at a different location and receives the timbre from the example audio analyzer 100 via wireless communication. In some examples, the audio determiner 108 sends instructions to the example audio media output device 102 and/or the example audio analyzer 100 (e.g., when the example audio analyzer 100 is implemented in the example media output device 102) to adjust audio equalizer settings based on the audio classification. For example, if the audio determiner 108 classifies audio output by the media output device 102 as coming from a loudspeaker, the example audio determiner 108 may send instructions to adjust audio equalizer settings to settings corresponding to the loudspeaker audio. The exemplary audio determiner 108 is further described below in conjunction with fig. 2.

Fig. 2 includes a block diagram of an exemplary implementation of the exemplary audio analyzer 100 and the exemplary audio determiner 108 of fig. 1. The exemplary audio analyzer 100 of fig. 2 includes an exemplary media interface 200, an exemplary audio extractor 202, an exemplary audio feature extractor 204, and an exemplary device interface 206. The example audio determiner 108 of fig. 2 includes an example device interface 210, an example tone color processor 212, an example tone color database 214, and an example audio setting adjuster 216. In some examples, elements of the example audio analyzer 100 may be implemented in the example audio determiner 108 and/or elements of the example audio determiner 108 may be implemented in the example audio analyzer 100.

The example media interface 200 of fig. 2 receives (e.g., samples) the example media signal 106 of fig. 1. In some examples, the media interface 200 may be a microphone for collecting the media signal 106 by sensing ambient audio to obtain the media signal 106 as audio. In some examples, the media interface 200 may be an interface that directly receives audio and/or video signals (e.g., digital representations of media signals) to be output by the example media output device 102. In some examples, media interface 200 may include two interfaces, a microphone to detect and sample ambient audio and an interface to directly receive and/or sample audio and/or video signals.

The example audio extractor 202 of fig. 2 extracts audio from the received/sampled media signal 106. For example, the audio extractor 202 determines whether the received media signal 106 corresponds to an audio signal or a video signal having an audio component. If the media signal corresponds to a video signal having an audio component, the example audio extractor 202 extracts the audio component to generate an audio signal/sample for further processing.

The example audio feature extractor 204 of fig. 2 processes the audio signal/samples to extract a pitch independent log spectrum of timbre and/or a pitch independent log spectrum of timbre. The log spectrum is a convolution between a pitch-independent (e.g., no pitch) tone log spectrum and a pitch-independent (e.g., no pitch) tone log spectrum (e.g., X ═ T × P, where X is the log spectrum of the audio signal, T is the pitch-independent log spectrum, and P is the pitch-independent tone log spectrum). Thus, in the fourier domain, the magnitude of the Fourier Transform (FT) of the logarithmic spectrum on the audio signal may correspond to an approximation of the FT of the timbreValues (e.g., F (x) ═ F (t) × F (p), where F () is the fourier transform, F (t) ≈ F (x) |, and F (p) ≈ e | ^jarg(F(X))). A complex argument is a combination of amplitude and phase (e.g., corresponding to energy and offset). Thus, the FT of the timbre may be approximated by the magnitude of the FT of the logarithmic spectrum. Thus, to determine the tonal-independent and/or tonal-independent log spectra of an audio signal, the example audio feature extractor 204 determines the log spectrum of the audio signal (e.g., using a Constant Q Transform (CQT)) and transforms the log spectrum to the frequency domain (e.g., using FT). In this manner, the exemplary audio feature extractor 204(a) is based on an inverse transform (e.g., an inverse fourier transform (F) of the magnitude of the transform output^-1) (e.g., T ═ F)^-1(| F (x)) |) to determine a pitch-dependent timbre log spectrum, and (B) based on an inverse transformation of a complex argument of the transformed output (e.g., P ═ F)^-1(e^jarg(F(X))) To determine the log spectrum of the non-tonal tones. The logarithmic frequency scale of the audio spectrum of the audio signal allows pitch shift to be equal to the vertical translation. Thus, the example audio feature extractor 204 determines the log spectrum of the audio signal using the CQT.

In some examples, if the example audio feature extractor 204 of fig. 2 determines that the result is not satisfactory in color and/or tone, the audio feature extractor 204 filters the result to improve the decomposition. For example, the audio feature extractor 204 may filter the results by emphasizing particular harmonics in the timbre or by emphasizing a single peak/line in the tone and updating other components of the results. The exemplary audio feature extractor 204 may filter once or may perform an iterative algorithm while updating the filter/pitch at each iteration, thereby ensuring that the overall convolution of the pitch and timbre produces the original log spectrum of the audio. The audio feature extractor 204 may determine whether the results are satisfactory based on user and/or manufacturer preferences.

The example device interface 206 of the example audio analyzer 100 of fig. 2 interfaces with the example audio determiner 108 and/or other devices (e.g., user interface, processing device, etc.). For example, when the audio feature extractor 204 determines a tone-independent timbre attribute, the example device interface 206 may send the attribute to the example audio determiner 108 to classify the audio and/or identify the media. In response, the device interface 206 may receive a classification and/or identification (e.g., an identifier corresponding to a source of the media signal 106) from the example audio determiner 108 (e.g., in a signal or report). In such an example, the example device interface 206 may send the classification and/or identification to other devices (e.g., a user interface) to display the classification and/or identification to a user. For example, if the audio analyzer 100 is being used in conjunction with a smartphone, the device interface 206 may output the results of the classification and/or identification to a user of the smartphone via an interface (e.g., a screen) of the smartphone.

The example device interface 210 of the example audio determiner 108 of fig. 2 receives tone-independent timbre attributes from the example audio analyzer 100. Additionally, the example device interface 210 outputs a signal/report representative of the classification and/or identification determined by the example audio determiner 108. The report may be a signal corresponding to a classification and/or identification based on the received timbre. In some examples, the device interface 210 sends the report (e.g., including an identification of the media corresponding to the timbre) to a processor (e.g., a processor of the audience measurement entity) for further processing. For example, the processor of the receiving device may process the report to generate a media exposure metric, an audience measurement metric, and so on. In some examples, the device interface 210 sends the report to the example audio analyzer 100.

The example timbre processor 212 of fig. 2 processes the received timbre attributes of the example audio analyzer 100 to characterize the audio and/or to identify the source of the audio. For example, the tone color processor 212 may compare the received tone color attributes to reference attributes in the example tone color database 214. In this manner, if the example tone color processor 212 determines that the received tone color attribute matches the reference attribute, the example tone color processor 212 classifies and/or identifies the source of the audio based on the data corresponding to the matching reference tone color attribute. For example, if the timbre processor 212 determines that the received timbre attributes match the reference timbre attributes corresponding to the particular commercial, the timbre processor 212 identifies the source of the audio as the particular commercial. In some examples, the classification may include a type classification. For example, if the example timbre processor 212 determines a plurality of instruments based on timbres, the example timbre processor 212 may identify a style of audio (e.g., classical, rock, hipop, etc.) based on the identified instruments and/or based on the timbres themselves. In some examples, when the tone color processor 212 does not find a match, the example tone color processor 212 stores the received tone color attribute in the tone color database 214 to become the new reference tone color attribute. If the example tone color processor 212 stores the new reference tone color in the example tone color database 214, the example device interface 210 sends instructions to the example audio analyzer 100 to prompt the user to identify information (e.g., what is an audio classification, what is a media source, etc.). In this manner, if audio analyzer 100 responds to the additional information, tone color database 214 may store the additional information in conjunction with the new reference tone color. In some examples, the technician analyzes the new reference timbre to determine additional information. The example tone color processor 212 generates a report based on the classification and/or identification.

The example audio setting adjuster 216 of fig. 2 determines audio equalizer settings based on the classified audio. For example, if the classified audio corresponds to one or more instruments and/or styles, the example audio setting adjuster 216 may determine audio equalizer settings corresponding to the one or more instruments and/or styles. In some examples, if the audio is classified as classical music, the example audio setting adjuster 216 may select a classical audio equalizer setting corresponding to classical music (e.g., based on a bass level, a tremolo level, etc.). In this manner, the example device interface 210 may send the audio equalizer settings to the example media output device 102 and/or the example audio analyzer 100 to adjust the audio equalizer settings of the example media output device 102.

Although an example manner of implementing the example audio analyzer 100 and the example audio determiner 108 of fig. 1 is shown in fig. 2, one or more of the elements, processes and/or devices shown in fig. 2 may be combined, divided, rearranged, omitted, removed and/or implemented in any other way. Further, the example media interface 200, the example audio extractor 202, the example audio feature extractor 204, the example device interface 206, the example audio setting adjuster 216, and/or (more generally) the example audio analyzer 100 and/or the example device interface 210 of fig. 2, the example tone color processor 212, the example tone color database 214, the example audio setting adjuster 216, and/or (more generally) the example audio determiner 108 of fig. 2 may be implemented by hardware, software, firmware, and/or any combination of hardware, software, and/or firmware. Thus, for example, the example media interface 200, the example audio extractor 202, the example audio feature extractor 204, the example device interface 206, and/or (more generally) the example audio analyzer 100 and/or the example device interface 210 of fig. 2, the example tone color processor 212, the example tone color database 214, the example audio setting adjuster 216, and/or (more generally) the example audio determiner 108 of fig. 2 may be implemented by one or more analog or digital circuits, logic circuits, programmable processors, programmable controllers, Graphics Processing Units (GPUs), Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), and/or Field Programmable Logic Devices (FPLDs). When reading any apparatus or system claims of this patent to cover a purely software and/or firmware implementation, at least one of the example media interface 200, the example audio extractor 202, the example audio feature extractor 204, the example device interface 206, and/or (more generally) the example audio analyzer 100 and/or the example device interface 210 of fig. 2, the example tone color processor 212, the example tone color database 214, the example audio setting adjuster 216, and/or (more generally) the example audio determiner 108 of fig. 2 is thus expressly defined to include a non-transitory computer-readable storage device or storage disk, such as a memory including software and/or firmware, a Digital Versatile Disk (DVD), a Compact Disk (CD), a blu-ray disk, and/or the like. Further, the example audio analyzer 100 and/or the example audio determiner 108 of fig. 1 may also include one or more elements, processes, and/or devices in addition to, or instead of, those shown in fig. 2, and/or may include more than one of any or all of the illustrated elements, processes, and devices. As used herein, the phrase "in communication" including variations thereof encompasses direct communication and/or indirect communication through one or more intermediate components, and does not require direct physical (e.g., wired) communication and/or constant communication, but additionally includes selective communication at periodic intervals, predetermined intervals, aperiodic intervals, and/or one-time events.

A flowchart representative of exemplary hardware logic or machine readable instructions for implementing the audio analyzer 100 of fig. 2 is shown in fig. 3, and a flowchart representative of exemplary hardware logic or machine readable instructions for implementing the audio determiner 108 of fig. 2 is shown in fig. 4. The machine readable instructions may be a program or a portion of a program that is executed by a processor, such as the processor 612, 712 shown in the example processor platform 600, 700 discussed below in connection with fig. 6 and/or fig. 7. The program may be implemented in software stored on a non-transitory computer readable storage medium such as a CD-ROM, a floppy disk, a hard drive, a DVD, a blu-ray disk, or a memory associated with the processors 612, 712, but the entire program and/or parts thereof could alternatively be executed by a device 712 other than the processors 612, 712 and/or implemented in firmware or dedicated hardware. Further, although the example program is described with reference to the flowcharts shown in fig. 3 and 4, many other methods of implementing the example audio analyzer 100 and/or the example audio determiner 108 may alternatively be used. For example, the order of execution of the blocks may be changed, and/or some of the blocks described may be changed, eliminated, or combined. Additionally or alternatively, any or all of the blocks may be implemented by one or more hardware circuits (e.g., discrete and/or integrated analog and/or digital circuits, FPGAs, ASICs, comparators, operational amplifiers (op-amps), logic circuitry, etc.) configured to perform corresponding operations without the execution of software or firmware.

As described above, the example processes of fig. 3 and 4 are implemented using executable instructions (e.g., computer and/or machine readable instructions) stored on a non-transitory computer and/or machine readable medium, such as a hard disk drive, a flash memory, a read-only memory, an optical disk, a digital versatile disk, a cache, a random access memory, and/or any other storage device or storage disk, in which information is stored for any duration (e.g., for extended periods of time, permanently for brief instances, for temporarily buffering, and/or for caching of the information). As used herein, the term non-transitory computer readable medium is expressly defined to include any type of computer readable storage device and/or storage disk and to exclude propagating signals and transmission media.

The terms "comprising" and "including" are used herein as open-ended terms. Thus, whenever a claim recites "comprising" or "comprising" in any form thereof or within the language of any claim recitations, it should be understood that additional elements, terms, or the like, may be present without departing from the scope of the corresponding claim or recitations. As used herein, the phrase "at least" is also open-ended, in the same way as the terms "comprising" and "including" when used as a transitional term, such as in the preamble of a claim. When used, for example, in a form such as A, B and/or C, the term "and/or" refers to any combination or subset of A, B, C, such as (1) a alone, (2) B alone, (3) C alone, (4) a and B, (5) a and C, and (6) B and C.

Fig. 3 is an exemplary flow diagram 300 representing exemplary machine-readable instructions that may be executed by the exemplary audio analyzer 100 of fig. 1 and 2 to extract a pitch independent timbre property from a media signal (e.g., an audio signal of the media signal). Although the instructions of fig. 3 are described in connection with the exemplary audio analyzer 100 of fig. 1, the exemplary instructions may be used by an audio analyzer in any environment.

At block 302, the example media interface 200 receives one or more media signals or samples of media signals (e.g., the example media signal 106). As described above, the example media interface 200 may receive the media signal 106 directly (e.g., as a signal to/from the media output device 102) or indirectly (e.g., as a microphone that detects the media signal by sensing ambient audio). At block 304, the example audio extractor 202 determines whether the media signal corresponds to video or audio. For example, if a media signal is received using a microphone, the audio extractor 202 determines that the media corresponds to audio. However, if the media signal is a received signal, the audio extractor 202 processes the received media signal to determine whether the media signal corresponds to audio or video having an audio component. If the example audio extractor 202 determines that the media signal corresponds to audio (block 304: audio), the process continues to block 308. If the example audio extractor 202 determines that the media signal corresponds to video (block 306: video), the example audio extractor 202 extracts an audio component from the media signal (block 306).

At block 308, the example audio feature extractor 204 determines a log spectrum of the audio signal (e.g., X). For example, the audio feature extractor 204 may determine a log spectrum of the audio signal by performing a CQT. At block 310, the example audio feature extractor 204 transforms the log spectrum to the frequency domain. For example, the audio feature extractor 204 performs FT (e.g., f (x)) on the logarithmic spectrum. At block 312, the example audio feature extractor 204 determines the magnitude of the transform update (e.g., | f (x) |). At block 314, the example audio feature extractor 204 determines a pitch-independent log-tone spectrum (e.g., T ═ F) of the audio based on an inverse transform (e.g., inverse FT) of the magnitude of the transform output^-1| f (x) |. At block 316, the example audio feature extractor 204 determines a complex argument (e.g., e) of the transform output^jarg(F(X))). At block 318, the example audio feature extractor 204 determines a non-tonal log spectrum of the audio (e.g., P ═ F) based on an inverse transform (e.g., inverse FT) of the complex arguments of the transform output^-1(e^jarg(F ^(X)))。

At block 320, the example audio feature extractor 204 determines whether the results (e.g., the determined pitch and/or the determined timbre) are satisfactory. As described above in connection with FIG. 2, the example audio feature extractor 204 determines whether the results are satisfactory based on user and/or manufacturer result preferences. If the example audio feature extractor 204 determines that the results are satisfactory (block 320: YES), the process continues to block 324. If the example audio feature extractor 204 determines that the results are not satisfactory (block 320: no), the example audio feature extractor 204 filters the results (block 322). As described above in connection with fig. 2, the example audio feature extractor 204 may filter the results (e.g., once or iteratively) by emphasizing harmonics in the timbre or forcing a single peak/line in the pitch.

At block 324, the example device interface 206 sends the results to the example audio determiner 108. At block 326, the example audio feature extractor 204 receives classification and/or identification data corresponding to the audio signal. Alternatively, if the audio determiner 108 is unable to match the timbre of the audio signal to the reference, the device interface 206 may send instructions for additional data corresponding to the audio signal. In such an example, the device interface 206 may send a prompt to the user interface to provide the user with additional data. Accordingly, the example device interface 206 may provide the additional data to the example audio determiner 108 to generate new reference timbre attributes. At block 328, the example audio feature extractor 204 sends the classification and/or identification to other connected devices. For example, the audio feature extractor 204 may send the classification to a user interface to provide the classification to the user.

Fig. 4 is an exemplary flow diagram 400 representing example machine readable instructions that may be executed by the exemplary audio determiner 108 of fig. 1 and 2 to classify audio and/or identify media based on tone-independent timbre attributes of the audio. Although the instructions of fig. 4 are described in connection with the example audio determiner 108 of fig. 1, the example instructions may be used by an audio determiner in any environment.

At block 402, the example device interface 210 receives a measured (e.g., determined or extracted) chromatic log spectrum of the unvoiced tone from the example audio analyzer 100. At block 404, the example tone processor 212 compares the measured toneless tone log spectrum to a reference toneless tone log spectrum in the example tone database 214. At block 406, the example tone color processor 212 determines whether a match is found between the received chromatic tone attribute and the reference chromatic tone attribute. If the example timbre processor 212 determines that a match is determined (block 406: yes), the example timbre processor 212 classifies the audio based on the match (e.g., identifies the instrument and/or style) and/or identifies media corresponding to the audio using additional data stored in the example timbre database 214 corresponding to the matched reference timbre attributes (block 408).

At block 410, the example audio setting adjuster 216 determines whether the audio settings of the media output device 102 can be adjusted. For example, there may be enabled settings to allow the audio settings of the media output device 102 to be adjusted based on the classification of the audio output by the example media output device 102. If the example audio setting adjuster 216 determines not to adjust the audio settings of the media output device 102 (block 410: no), the process continues to block 414. If the example audio setting adjuster 216 determines that the audio setting of the media output device 102 is to be adjusted (block 410: YES), the example audio setting adjuster 216 determines a media output device setting adjustment based on the classified audio. For example, the example audio setting adjuster 216 may select audio equalizer settings based on one or more identified instruments and/or identified styles (e.g., from timbre or based on identified instruments) (block 412). At block 414, the example device interface 210 outputs a report corresponding to the classification, identification, and/or media output device setting adjustment. In some examples, the device interface 210 outputs the report to another device for further processing/analysis. In some examples, the device interface 210 outputs reports to the example audio analyzer 100 to display results to a user via a user interface. In some examples, to adjust the audio settings of the media output device 102, the device interface 210 outputs the report to the example media output device 102.

If the example tone color processor 212 determines that a match is not determined (block 406: no), the example device interface 210 prompts for additional information corresponding to the audio signal (block 416). For example, the device interface 210 may send instructions to the exemplary audio analyzer 100 to (a) prompt the user to provide information corresponding to audio, or (B) prompt the audio analyzer 100 to reply with a complete audio signal. At block 418, the example tone database 214 stores the measured log spectrum of the non-tonal tones in conjunction with corresponding data that may have been received.

Fig. 5 shows an exemplary FT of a log spectrum 500 of an audio signal, an exemplary toneless log spectrum 502 of the audio signal, and an exemplary toneless log spectrum 504 of the audio signal.

As described in connection with fig. 2, when the example audio analyzer 100 receives the example media signal 106 (e.g., or a sample of the media signal), the example audio analyzer 100 determines an example log spectrum of the audio signal/sample (e.g., if the media sample corresponds to a video signal, the audio analyzer 100 extracts an audio component). In addition, the example audio analyzer 100 determines the FT of the log spectrum. The exemplary FT log spectrum 500 of fig. 5 corresponds to an exemplary transformed output of the log spectrum of the audio signal/sample. The exemplary tonal-free pitch log spectrum 502 corresponds to an inverse FT (e.g., P ═ F) of the complex arguments of the exemplary FT of the log spectrum 500 ^-1(e^jarg(F(X))) And the unvoiced tone color log spectrum 504 corresponds to an inverse FT of the magnitude of the exemplary FT of the log spectrum 500 (e.g., T ═ F)^-1(| f (x) |). As shown in fig. 5, an exemplary FT of the log spectrum 500 corresponds to a convolution of an exemplary chromatic-free tone log spectrum 502 and an exemplary chromatic-free tone log spectrum 504. The convolution of the exemplary pitch log spectrum 502 with peaks adds an offset.

Fig. 6 is a block diagram of an exemplary processor platform 600 configured to execute the instructions of fig. 3 to implement the audio analyzer 100 of fig. 2. The processor platform 600 may be, for example, a server, a personal computer, a workstation, a self-learning machine (e.g., a neural network), a mobile device (e.g., a cellular phone, a smart phone, such as an iPad), a mobile device, such as a mobile phone, a^TMA tablet computer), a Personal Digital Assistant (PDA), an internet appliance, a DVD player, a CD player, a digital video recorder, a blu-ray player game player, a personal video recorder, a set-top box, a headset or other wearable device, or any other type of computing device.

The processor platform 600 of the illustrated example includes a processor 612. The processor 612 of the illustrated example is hardware. For example, the processor 612 may be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor-based (e.g., silicon-based) device. In this example, the processor implements the example media interface 200, the example audio extractor 202, the example audio feature extractor 204, and/or the example device interface of fig. 2.

The processor 612 of the illustrated example includes local memory 613 (e.g., cache). The processor 612 of the illustrated example communicates with a main memory including a volatile memory 614 and a non-volatile memory 616 via a bus 618. The volatile memory 614 may be comprised of Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM),Dynamic random access memory

And/or any other type of random access memory device. The non-volatile memory 616 may be implemented by flash memory and/or any other desired type of memory device. Access to the

main memory

614, 616 is controlled by a memory controller.

The processor platform 600 of the illustrated example also includes an interface circuit 620. The interface circuit 620 may be implemented by any type of interface standard, such as an Ethernet interface, Universal Serial Bus (USB), Bluetooth

An interface, a Near Field Communication (NFC) interface, and/or a PCI Express interface.

In the example shown, one or more interface circuits 622 are connected to the interface circuit 620. An input device 622 allows a user to enter data and/or commands into the processor 612. The input devices may be implemented by, for example, audio sensors, microphones, cameras (still or video), keyboards, buttons, mice, touch screens, track pads, track balls, isopoint, and/or voice recognition systems.

One or more output devices 624 are also connected to the interface circuit 620 of the illustrated example. The output devices 624 may be implemented, for example, by display devices (e.g., Light Emitting Diodes (LEDs), Organic Light Emitting Diodes (OLEDs), Liquid Crystal Displays (LCDs), cathode ray tube displays (CRTs), in-plane switching (IPS) displays, touch screens, etc.), tactile output devices, printers, and/or speakers. Thus, the interface circuit 620 of the illustrated example generally includes a graphics driver card, a graphics driver chip, and/or a graphics driver processor.

To facilitate exchanging data with external machines (e.g., any type of computing device) via a network 626, the interface circuit 620 of the illustrated example also includes communication devices, such as transmitters, receivers, transceivers, modems, residential gateways, wireless access points, and/or network interfaces. Communication may be accomplished through, for example, an ethernet connection, a Digital Subscriber Line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site (line-of-site) wireless system, a cellular telephone system, or the like.

The processor platform 600 of the illustrated example also includes one or more mass storage devices 628 for storing software and/or data. Examples of such mass storage devices 628 include floppy disk drives, hard disk drives, optical disk drives, blu-ray disk drives, Redundant Array of Independent Disks (RAID) systems, and Digital Versatile Disk (DVD) drives.

The machine-executable instructions 632 of fig. 3 may be stored in the mass storage device 628, in the volatile memory 614, in the non-volatile memory 616, and/or on a removable non-transitory computer-readable storage medium such as a CD or DVD.

Fig. 7 is a block diagram of an exemplary processor platform 700 configured to execute the instructions of fig. 4 to implement the audio determiner 108 of fig. 2. Processor platform 700 may be, for example, a server, a personal computer, a workstation, a self-learning machine (e.g., a neural network), a mobile device (e.g., a cellular network), or the likeTelephone, smart phone, such as iPad^TMA tablet computer), a Personal Digital Assistant (PDA), an internet appliance, a DVD player, a CD player, a digital video recorder, a blu-ray player, a game player, a personal video recorder, a set-top box, a headset, or other wearable device, or any other type of computing device.

The processor platform 700 of the illustrated example includes a processor 712. The processor 712 of the illustrated example is hardware. For example, the processor 712 may be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor-based (e.g., silicon-based) device. In this example, the processor implements the example device interface 210, the example tone color processor 212, the example tone color database 214, and/or the example audio setting adjuster 216.

The processor 712 of the illustrated example includes local memory 713 (e.g., a cache). The processor 712 of the illustrated example is in communication with a main memory including a volatile memory 714 and a non-volatile memory 716 via a bus 718. The volatile memory 714 may be comprised of Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM),Dynamic random access memoryAnd/or any other type of random access memory device. The non-volatile memory 716 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 714, 716 is controlled by a memory controller.

The processor platform 700 of the illustrated example also includes interface circuitry 720. Interface circuit 720 may be implemented by any type of interface standard, such as an Ethernet interface, Universal Serial Bus (USB), Bluetooth

An interface, a Near Field Communication (NFC) interface, and/or a PCI Express interface.

In the illustrated example, one or more input devices 722 are connected to the interface circuit 720. An input device 722 allows a user to enter data and/or commands into the processor 712. The input devices may be implemented by, for example, audio sensors, microphones, cameras (still or video), keyboards, buttons, mice, touch screens, track pads, track balls, isopoint, and/or voice recognition systems.

One or more output devices 724 are also connected to the interface circuit 720 of the illustrated example. The output devices 724 may be implemented, for example, by display devices (e.g., Light Emitting Diodes (LEDs), Organic Light Emitting Diodes (OLEDs), Liquid Crystal Displays (LCDs), cathode ray tube displays (CRTs), in-plane switching (IPS) displays, touch screens, etc.), tactile output devices, printers, and/or speakers. Thus, the interface circuit 720 of the illustrated example generally includes a graphics driver card, a graphics driver chip, and/or a graphics driver processor.

To facilitate exchanging data with external machines (e.g., any type of computing device) via a network 726, the interface circuit 720 of the illustrated example also includes communication devices, such as transmitters, receivers, transceivers, modems, residential gateways, wireless access points, and/or network interfaces. Communication may be accomplished through, for example, an ethernet connection, a Digital Subscriber Line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site (line-of-site) wireless system, a cellular telephone system, or the like.

The processor platform 700 of the illustrated example also includes one or more mass storage devices 728 for storing software and/or data. Examples of such mass storage devices 728 include floppy disk drives, hard disk drives, optical disk drives, blu-ray disk drives, Redundant Array of Independent Disks (RAID) systems, and Digital Versatile Disk (DVD) drives.

The machine-executable instructions 732 of fig. 4 may be stored in the mass storage device 728, in the volatile memory 714, in the non-volatile memory 716, and/or on a removable non-transitory computer-readable storage medium, such as a CD or DVD.

As can be appreciated from the foregoing, the above disclosed methods, apparatus and articles extract tonal attributes from a media signal that are not related to pitch. Examples disclosed herein determine a tonal-free independent timbre log spectrum based on audio received directly or indirectly from a media output device. Examples disclosed herein also include classifying audio based on timbre (e.g., identifying instruments) and/or identifying media sources for audio based on timbre (e.g., songs, video games, advertisements, etc.). Using the examples disclosed herein, timbres may be used to classify and/or identify audio with far fewer resources than conventional techniques, as extracting timbres is pitch independent. Thus, audio can be classified and/or identified without requiring multiple reference timbre attributes for multiple tones. Instead, tone-independent timbres may be used to classify audio regardless of tone.

Although certain example methods, apparatus, and articles of manufacture have been described herein, other implementations are possible. The scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus, and articles of manufacture fairly falling within the scope of the appended claims either literally or under the doctrine of equivalents.

21页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：车载装置、行驶状态推算方法、服务器装置、信息处理方法和行驶状态推算系统

Method and apparatus for extracting tone-independent timbre attributes from a media signal

相关技术

网友询问留言