Audio fingerprint generation method, device, equipment and storage medium

文档序号：1310322 发布日期：2020-07-10 浏览：2次中文

阅读说明：本技术 音频指纹的生成方法、装置、设备及存储介质 (Audio fingerprint generation method, device, equipment and storage medium ) 是由缪畅宇于 2020-03-20 设计创作，主要内容包括：本申请实施例提供一种音频指纹的生成方法、装置、计算机设备及存储介质。该方法包括：获取第一音频片段的频谱图；在频谱图中确定出m个特征点；获取m个特征点对应的n个特征点对；计算n个特征点对分别对应的特征值区间,得到第一音频片段的音频指纹,特征点对对应的特征值区间用于表示特征点对对应的旋律段的时间特征范围和频率特征范围。在本申请实施例中,通过对采用模糊的特征值区间来表示音频片段的音频指纹,可以尽可能地减小噪音、过度压缩等外部因素对生成音频指纹造成的干扰。(The embodiment of the application provides a method and a device for generating an audio fingerprint, computer equipment and a storage medium. The method comprises the following steps: acquiring a spectrogram of a first audio clip; determining m characteristic points in the spectrogram; acquiring n characteristic point pairs corresponding to the m characteristic points; and calculating characteristic value intervals corresponding to the n characteristic point pairs respectively to obtain the audio fingerprint of the first audio fragment, wherein the characteristic value intervals corresponding to the characteristic point pairs are used for representing the time characteristic range and the frequency characteristic range of the melody segment corresponding to the characteristic point pairs. In the embodiment of the application, the audio fingerprints of the audio segments are represented by fuzzy characteristic value intervals, so that the interference of external factors such as noise and excessive compression on the generation of the audio fingerprints can be reduced as much as possible.)

1. A method of generating an audio fingerprint, the method comprising:

acquiring a spectrogram of a first audio clip;

determining m characteristic points in the spectrogram, wherein m is a positive integer;

acquiring n characteristic point pairs corresponding to the m characteristic points, wherein n is a positive integer;

and calculating characteristic value intervals corresponding to the n characteristic point pairs respectively to obtain the audio fingerprint of the first audio fragment, wherein the characteristic value intervals corresponding to the characteristic point pairs are used for representing the time characteristic range and the frequency characteristic range of the melody segment corresponding to the characteristic point pairs.

2. The method according to claim 1, wherein the calculating the feature value intervals corresponding to the n feature point pairs respectively to obtain the audio fingerprint of the first audio segment includes:

for an ith characteristic point pair in the n characteristic point pairs, performing first fuzzy processing on a time change value included in the ith characteristic point pair to obtain a time change interval of the ith characteristic point pair;

performing second fuzzy processing on the frequency change value included in the ith characteristic point pair to obtain a frequency change interval of the ith characteristic point pair;

calculating a characteristic value interval corresponding to the ith characteristic point pair based on the time change interval of the ith characteristic point pair and the frequency change interval of the ith characteristic point pair;

the audio fingerprint of the first audio segment comprises characteristic value intervals respectively corresponding to the n characteristic point pairs, and i is a positive integer smaller than or equal to n.

3. The method according to claim 2, wherein a lower limit of a time variation interval of the i-th characteristic point pair is a first lower limit value, and an upper limit of the time variation interval of the i-th characteristic point pair is a first upper limit value; the lower limit of the frequency change interval of the ith characteristic point pair is a second lower limit, and the upper limit of the frequency change interval of the ith characteristic point pair is a second upper limit;

the calculating a characteristic value interval corresponding to the ith characteristic point pair based on the time change interval of the ith characteristic point pair and the frequency change interval of the ith characteristic point pair includes:

performing first hash coding processing on the frequency value of the ith characteristic point, the first lower limit value and the second lower limit value to obtain a first coded value;

performing second hash coding processing on the frequency value of the ith characteristic point, the first upper limit value and the second upper limit value to obtain a second coded value;

the characteristic value interval corresponding to the ith characteristic point pair is a value interval taking the first coded value and the second coded value as upper and lower limit values.

4. The method according to claim 1, wherein the obtaining n pairs of feature points corresponding to the m feature points comprises:

acquiring p characteristic point pairs corresponding to the m characteristic points respectively, wherein p is a positive integer;

removing target characteristic point pairs from the p characteristic point pairs respectively corresponding to the m characteristic points to obtain the n characteristic point pairs;

wherein the target feature point pairs include feature point pairs whose time variation values do not conform to the time distribution of the first audio piece, and/or feature point pairs whose frequency variation values do not conform to the frequency distribution of the first audio piece.

5. The method according to claim 4, wherein the removing target characteristic point pairs from the p characteristic point pairs respectively corresponding to the m characteristic point pairs to obtain the n characteristic point pairs comprises:

acquiring relative characteristic points corresponding to the characteristic point pairs;

performing statistical analysis on each relative characteristic point to obtain a normal distribution curve corresponding to the relative characteristic point;

and removing target characteristic point pairs from the p characteristic point pairs respectively corresponding to the m characteristic point pairs according to the normal distribution curve to obtain the n characteristic point pairs.

6. The method of claim 1, wherein the determining m feature points in the spectrogram comprises:

for a kth time point in the spectrogram, acquiring a target time frequency point, of which the frequency meets a preset condition, at the kth time point, wherein k is a positive integer;

and determining the target time frequency point with the frequency within a preset range in the target time frequency points as the characteristic point.

7. The method according to any one of claims 1 to 6, wherein after the calculating the feature value intervals corresponding to the n feature point pairs respectively to obtain the audio fingerprint of the first audio segment, the method further comprises:

determining the number of characteristic value intervals with intersection with the characteristic value intervals included by the audio fingerprint of the first audio piece in the characteristic value intervals included by the audio fingerprint of the second audio piece;

if the number is greater than a threshold, determining that the second audio clip matches the first audio clip.

8. An apparatus for generating an audio fingerprint, the apparatus comprising:

the spectrogram acquiring module is used for acquiring a spectrogram of the first audio clip;

the characteristic point determining module is used for determining m characteristic points in the spectrogram, wherein m is a positive integer;

a characteristic point pair obtaining module, configured to obtain n characteristic point pairs corresponding to the m characteristic points, where n is a positive integer;

and the audio fingerprint generating module is used for calculating characteristic value intervals corresponding to the n characteristic point pairs respectively to obtain the audio fingerprint of the first audio fragment, wherein the characteristic value intervals corresponding to the characteristic point pairs are used for representing the time characteristic range and the frequency characteristic range of the melody segment corresponding to the characteristic point pairs.

9. The apparatus of claim 8, wherein the audio fingerprint generation module is configured to:

for an ith characteristic point pair in the n characteristic point pairs, performing first fuzzy processing on a time variation value included in the ith characteristic point pair to obtain a time variation interval corresponding to the ith characteristic point pair;

performing second fuzzy processing on the frequency change value included in the ith characteristic point pair to obtain a frequency change interval of the ith characteristic point pair;

10. The apparatus according to claim 9, wherein a lower limit of a time variation interval of the i-th characteristic point pair is a first lower limit value, and an upper limit of the time variation interval of the i-th characteristic point pair is a first upper limit value; the lower limit of the frequency change interval of the ith characteristic point pair is a second lower limit, and the upper limit of the frequency change interval of the ith characteristic point pair is a second upper limit;

the audio fingerprint generation module is configured to:

performing first hash coding processing on the frequency value of the ith characteristic point, the first lower limit value and the second lower limit value to obtain a first coded value;

performing second hash coding processing on the frequency value of the ith characteristic point, the first upper limit value and the second upper limit value to obtain a second coded value;

the characteristic value interval corresponding to the ith characteristic point is a value interval taking the first code value and the second code value as upper and lower limit values.

11. A computer device comprising a processor and a memory, the memory storing at least one instruction, at least one program, set of codes, or set of instructions, the at least one instruction, at least one program, set of codes, or set of instructions being loaded and executed by the processor to implement the method of generating an audio fingerprint according to any one of claims 1 to 7.

12. A computer-readable storage medium, having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of generating an audio fingerprint according to any one of claims 1 to 7.

22页详细技术资料下载

Audio fingerprint generation method, device, equipment and storage medium

相关技术

网友询问留言