Audio file cutting position processing method and device

文档序号:972948 发布日期:2020-11-03 浏览:8次 中文

阅读说明:本技术 一种音频文件切割位置处理方法及装置 (Audio file cutting position processing method and device ) 是由 黄磊 杨春勇 靳丁南 权圣 于 2019-12-12 设计创作,主要内容包括:本发明提供一种音频文件切割位置处理方法及装置,该方法包括:预标注音频文件集中的各个音频文件的切割位置,其中,所述音频文件集包括至少两个音频文件;根据所述音频文件集中的音频文件的短时过零率的变化特征,对所述音频文件集中的音频文件的切割位置进行调整。通过本发明提供的音频文件切割位置处理方法,可以提高音频文件的切割位置标注的准确性。(The invention provides a method and a device for processing a cutting position of an audio file, wherein the method comprises the following steps: pre-labeling the cutting position of each audio file in an audio file set, wherein the audio file set comprises at least two audio files; and adjusting the cutting position of the audio files in the audio file set according to the change characteristics of the short-time zero crossing rate of the audio files in the audio file set. By the method for processing the cutting position of the audio file, the accuracy of marking the cutting position of the audio file can be improved.)

1. An audio file cutting position processing method is characterized by comprising the following steps:

pre-labeling the cutting position of each audio file in an audio file set, wherein the audio file set comprises at least two audio files;

and adjusting the cutting position of the audio files in the audio file set according to the change characteristics of the short-time zero crossing rate of the audio files in the audio file set.

2. The method of claim 1, wherein the adjusting the cutting positions of the audio files in the set of audio files according to the change characteristics of the short-time zero-crossing rate of the audio files in the set of audio files comprises:

if the change characteristic of the short-time zero-crossing rate of the first sub-audio file of the first audio file meets the change from large to small or from small to large, adjusting the first cutting position to be the transition position of the short-time zero-crossing rate change of the first sub-audio file;

the first audio file is any audio file in the audio file set, the first cutting position is any pre-marked cutting position in the first audio file, and the first sub-audio file is an audio file including audio on two sides of the first cutting position in the first audio file.

3. The method according to claim 2, wherein if the change characteristic of the short-time zero-crossing rate of the first sub-audio file of the first audio file satisfies a change from large to small or a change from small to large, the method further comprises, before adjusting the first cutting position to be the transition position of the short-time zero-crossing rate change of the first sub-audio file:

judging whether an initial and final combination corresponding to the first cutting position belongs to a preset initial and final combination set or not, wherein the initial and final combination corresponding to the first cutting position is a combination of an initial and a final of characters corresponding to the audios on two sides of the first cutting position, and the audios on two sides of the first cutting position correspond to one character;

if the initial and final combination corresponding to the first cutting position belongs to a preset initial and final combination set, determining that the change characteristic of the short-time zero crossing rate of the first sub-audio file of the first audio file meets the change from large to small or the change from small to large;

and if the initial and final combination corresponding to the first cutting position does not belong to a preset initial and final combination set, determining that the change characteristics of the short-time zero crossing rate of the first sub-audio file of the first audio file do not meet the change from large to small and do not meet the change from small to large.

4. The method of claim 2, wherein the adjusting the first cutting location to be a transition location of the short-time zero-crossing rate change of the first sub-audio file comprises:

and adjusting the first cutting position to be a first transition position of the short-time zero-crossing rate change of the first sub-audio file, wherein the energy value of the first transition position is zero.

5. The method of claim 2, further comprising:

if the change characteristics of the short-time zero crossing rate of the first sub-audio file of the first audio file do not meet the change from large to small and do not meet the change from small to large, receiving the adjustment operation of a user on the first cutting position;

and adjusting the first cutting position according to the adjusting operation.

6. The method of any of claims 1-5, wherein the pre-labeling the cutting locations of each audio file in the set of audio files comprises:

respectively carrying out voice recognition on each audio file in the audio file set so as to convert each audio file in the audio file set into a text, and pre-marking the cutting position of each audio file according to the text converted by each audio file;

or

And receiving the pre-marking of the cutting positions of the audio files in the audio file set by the user.

7. The method of any of claims 1-5, wherein after pre-labeling the cutting locations of the audio files in the set of audio files, before adjusting the cutting locations of the audio files in the set of audio files according to the change characteristic of the short-time zero-crossing rate of the audio files in the set of audio files, the method further comprises:

and adjusting the cutting position of the audio files in the audio file set according to the energy value of the audio files in the audio file set, so that the energy value of the adjusted cutting position is zero.

8. An audio file cutting position processing apparatus characterized by comprising:

the system comprises a preprocessing module, a processing module and a processing module, wherein the preprocessing module is used for pre-marking the cutting position of each audio file in an audio file set, and the audio file set comprises at least two audio files;

and the first adjusting module is used for adjusting the cutting position of the audio files in the audio file set according to the change characteristics of the short-time zero crossing rate of the audio files in the audio file set.

9. An audio file cutting position processing apparatus comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing steps comprising the audio file cutting position processing method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps comprising the audio file cutting position processing method according to any one of claims 1 to 7.

Technical Field

The invention relates to the technical field of information processing, in particular to a method and a device for processing a cutting position of an audio file.

Background

At present, the application of voice in artificial intelligence is increasing, and voice is taken as an important part of human-computer interaction, and a plurality of technologies appear, wherein the technologies comprise two key technologies of voice recognition and voice synthesis. In the deep learning field, enough audio data are generally required to be labeled so as to train models of speech recognition, speech synthesis and the like. However, the main method for labeling the audio file at present is to listen to the audio file manually, and label the audio time point corresponding to each text according to the text of the audio file, and this labeling method is not only high in labor cost, but also is difficult to clearly distinguish the cutting positions between the characters sometimes because the person is listening to the audio file to label, and it is easy to appear that the audio of the next character is carried by the previous character, or the audio of the previous character is carried by the next character, and the cutting position labeling of the audio file is not accurate enough, thereby resulting in poor model accuracy obtained by training the audio file based on the labeling.

Therefore, the problem that the accuracy of marking the cutting position of the audio file is low exists in the prior art.

Disclosure of Invention

The embodiment of the invention provides a method and a device for processing a cutting position of an audio file, which are used for solving the problem of low accuracy of marking the cutting position of the audio file in the prior art.

In order to solve the technical problem, the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides an audio file cutting position processing method. The method comprises the following steps:

pre-labeling the cutting position of each audio file in an audio file set, wherein the audio file set comprises at least two audio files;

and adjusting the cutting position of the audio files in the audio file set according to the change characteristics of the short-time zero crossing rate of the audio files in the audio file set.

In a second aspect, an embodiment of the present invention further provides an apparatus for processing a cutting position of an audio file. The audio file cutting position processing apparatus includes:

the system comprises a preprocessing module, a processing module and a processing module, wherein the preprocessing module is used for pre-marking the cutting position of each audio file in an audio file set, and the audio file set comprises at least two audio files;

and the first adjusting module is used for adjusting the cutting position of the audio files in the audio file set according to the change characteristics of the short-time zero crossing rate of the audio files in the audio file set.

In a third aspect, an embodiment of the present invention further provides an apparatus for processing an audio file cutting position, including a processor, a memory, and a computer program stored on the memory and operable on the processor, where the computer program, when executed by the processor, implements the steps of the method for processing an audio file cutting position.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements the steps of the audio file cutting position processing method described above.

In the embodiment of the invention, the cutting positions of all audio files in an audio file set are pre-marked, wherein the audio file set comprises at least two audio files; and adjusting the cutting position of the audio files in the audio file set according to the change characteristics of the short-time zero crossing rate of the audio files in the audio file set. The cutting position of the pre-marked audio file is adjusted according to the change characteristic of the short-time zero-crossing rate of the audio file, so that the accuracy of marking the cutting position of the audio file can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a flow chart of an audio file cutting position processing method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an audio waveform provided by an embodiment of the present invention;

FIG. 3 is a second schematic diagram of an audio waveform provided by an embodiment of the present invention;

fig. 4 is a block diagram of an audio file cutting position processing apparatus according to still another embodiment of the present invention;

fig. 5 is a block diagram of an audio file cutting position processing apparatus according to an embodiment of the present invention;

fig. 6 is a block diagram of an audio file cutting position processing apparatus according to still another embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a method for processing a cutting position of an audio file. Referring to fig. 1, fig. 1 is a flowchart of an audio file cutting position processing method according to an embodiment of the present invention, as shown in fig. 1, including the following steps:

step 101, pre-marking the cutting position of each audio file in an audio file set, wherein the audio file set comprises at least two audio files.

In this embodiment, the audio file set may include N collected audio files, where the value of N may be reasonably set according to actual requirements, for example, the value of N may be 2000, 8000, 10000, 20000, 100000, or the like.

In this step, the cutting positions of the individual audio files in the audio file set may be automatically pre-labeled. For example, the pre-marking of the cutting position can be performed according to the text corresponding to the audio file; the cutting positions of the audio files in the audio file set can be manually pre-marked; the cutting positions of the audio files in the audio file set may also be automatically pre-labeled and manually pre-labeled, for example, the cutting positions are pre-labeled according to the text corresponding to the audio files, and then the cutting positions of the audio files that are not automatically pre-labeled are pre-labeled manually, which is not limited in this embodiment.

And step 102, adjusting the cutting position of the audio files in the audio file set according to the change characteristics of the short-time zero crossing rate of the audio files in the audio file set.

In this embodiment, the above-mentioned short-term zero-crossing rate, which may also be referred to as a short-term average zero-crossing rate, is a feature parameter in time domain analysis of a speech signal, and represents the number of times that a signal in each audio frame passes through a zero value. The change characteristic of the short-time zero-crossing rate of the audio file can reflect the change condition of the short-time zero-crossing rate of the audio file.

In practical situations, there are some audio files corresponding to characters belonging to different phrases, and the short-time zero-crossing rate of the audio file may exhibit a variation characteristic from sparse to dense or from dense to sparse, for example, an audio waveform diagram corresponding to "fourteen hundred thousand" as shown in fig. 2, where the short-time zero-crossing rates of the audio waveform diagrams of "thousand" and "four" exhibit a variation from dense to sparse. Therefore, the cutting position of the pre-labeled audio file can be accurately adjusted based on the change characteristics of the short-time zero-crossing rate of the audio file, for example, for the audio waveform diagram shown in fig. 2, the cutting position between "thousand" and "four" in the audio file can be adjusted to the change position of the short-time zero-crossing rate of the audio waveform diagram of "thousand" and "four".

In the audio file cutting position processing method provided by this embodiment, the cutting position of each audio file in an audio file set is pre-labeled, where the audio file set includes at least two audio files; and adjusting the cutting position of the audio files in the audio file set according to the change characteristics of the short-time zero crossing rate of the audio files in the audio file set. The cutting position of the pre-marked audio file is adjusted according to the change characteristic of the short-time zero-crossing rate of the audio file, so that the accuracy of marking the cutting position of the audio file can be improved.

Optionally, in the step 102, that is, adjusting the cutting position of the audio file in the audio file set according to the change characteristic of the short-time zero-crossing rate of the audio file in the audio file set may include:

if the change characteristic of the short-time zero-crossing rate of the first sub-audio file of the first audio file meets the change from large to small or from small to large, adjusting the first cutting position to be the transition position of the short-time zero-crossing rate change of the first sub-audio file;

the first audio file is any audio file in the audio file set, the first cutting position is any pre-marked cutting position in the first audio file, and the first sub-audio file is an audio file including audio on two sides of the first cutting position in the first audio file.

Taking the audio waveform shown in fig. 2 as an example, the first sub audio file is sub audio files corresponding to 'thousand' and 'four', and the short-term zero-crossing rate thereof includes: 1.423 seconds: 0.79; 1.424 seconds: 0.81; 1.425 seconds: 0.82; 1.426 seconds: 2.3; 1.427 seconds: 2.4; 1.428 second: 2.6. therefore, the short-time zero-crossing rate of the sub audio files corresponding to the 'thousand' and the 'four' changes from small to large, and the transition position of the short-time zero-crossing rate changes is between 1.425 seconds and 1.426 seconds, so that the cutting position of the sub audio files corresponding to the 'thousand' and the 'four' can be adjusted to between 1.425 seconds and 1.426 seconds.

Optionally, in order to avoid the influence of the normal fluctuation of the short-term zero-crossing rate on the labeling result, the change characteristic of the short-term zero-crossing rate may be understood as a change other than the normal fluctuation of the short-term zero-crossing rate, for example, the short-term zero-crossing rate is considered to be changed when the change value of the short-term zero-crossing rate exceeds a preset value, wherein the preset value may be reasonably set based on the normal fluctuation condition of the short-term zero-crossing rate, for example, the preset value may be set to 0.1, 0.2, 1, or the like.

In practical situations, because the intervals of the audio signals between the words are often close, manual or text-based cutting position labeling methods are often difficult to find the cutting position accurately, and the accuracy of the cutting position labeling of the audio file can be improved by adjusting the pre-labeled cutting position to be a transition position with short-time zero-crossing rate change.

Optionally, if the change characteristic of the short-time zero-crossing rate of the first sub-audio file of the first audio file satisfies a change from large to small or a change from small to large, the method further includes, before adjusting the first cutting position to be the transition position of the short-time zero-crossing rate change of the first sub-audio file:

judging whether an initial and final combination corresponding to the first cutting position belongs to a preset initial and final combination set or not, wherein the initial and final combination corresponding to the first cutting position is a combination of an initial and a final of characters corresponding to the audios on two sides of the first cutting position, and the audios on two sides of the first cutting position correspond to one character;

if the initial and final combination corresponding to the first cutting position belongs to a preset initial and final combination set, determining that the change characteristic of the short-time zero crossing rate of the first sub-audio file of the first audio file meets the change from large to small or the change from small to large;

and if the initial and final combination corresponding to the first cutting position does not belong to a preset initial and final combination set, determining that the change characteristics of the short-time zero crossing rate of the first sub-audio file of the first audio file do not meet the change from large to small and do not meet the change from small to large.

In this embodiment, the combination of the initials and finals corresponding to the first cutting position can be understood as the combination of the finals of the characters corresponding to the front audio frequency of the first cutting position and the initials of the characters corresponding to the rear audio frequency of the first cutting position. For example, the initials and finals corresponding to the cutting positions between "thousand" and "four" are "an" and "s", and the initials and finals corresponding to the cutting positions between "nine" and "thousand" are "iu" and "q".

In practical application, the initial and final combinations of characters corresponding to the audio file with the change characteristics of the short-time zero crossing rate meeting the change from large to small or from small to large can be counted in advance to obtain the initial and final combination set. Therefore, in the process of adjusting the cutting position, the change characteristics of the short-time zero-crossing rate of the audio file do not need to be analyzed, whether the cutting position is suitable for adjusting the change characteristics based on the short-time zero-crossing rate or not is directly judged according to the initial consonant and final sound combination corresponding to the cutting position, and the efficiency of adjusting the cutting position of the audio file can be improved. Specifically, the first cutting position may be adjusted to be a transition position of the first sub audio file with a short-time zero-crossing rate change under the condition that the initial and final combination corresponding to the first cutting position belongs to the preset initial and final combination set, otherwise, the process may be ended, or the first cutting position may be adjusted in another manner, for example, the first cutting position is adjusted manually.

Optionally, the adjusting the first cutting position to be a transition position of the short-time zero-crossing rate change of the first sub-audio file may include:

and adjusting the first cutting position to be a first transition position of the short-time zero-crossing rate change of the first sub-audio file, wherein the energy value of the first transition position is zero.

In this embodiment, the first cutting position may be adjusted to be a position where the energy value is zero in the transition position of the short-time zero-crossing rate change of the first sub-audio file, so that the noise of the cut audio file may be reduced.

For example, for the audio waveform diagram shown in fig. 2, the cutting positions of the sub audio files corresponding to 'thousand' and 'four' may be adjusted to a position where the energy value is zero between 1.425 seconds and 1.426 seconds, for example, the energy value of 1.4251 seconds is 5, the energy value of 1.4252 seconds is 4, the energy value of 1.4253 seconds is 3, the energy value of 1.4254 seconds is 1, and the energy value of 1.4255 seconds is 0, and the cutting positions of the sub audio files corresponding to 'thousand' and 'four' may be adjusted to 1.4255 seconds.

Optionally, the method may further include:

if the change characteristics of the short-time zero crossing rate of the first sub-audio file of the first audio file do not meet the change from large to small and do not meet the change from small to large, receiving the adjustment operation of a user on the first cutting position;

and adjusting the first cutting position according to the adjusting operation.

In this embodiment, when the change characteristic of the short-time zero crossing rate of the first sub-audio file does not satisfy the change from large to small and does not satisfy the change from small to large, the first cutting position may be adjusted based on a user operation. For example, the user may determine a more suitable cutting position by analyzing and observing the audio waveform of the enlarged first sub-audio file, and adjust the first cutting position to the determined more suitable cutting position.

Optionally, after the first cutting position is adjusted according to the adjustment operation of the user on the first cutting position, the adjusted first cutting position may be further finely adjusted, for example, in a case that the energy value of the adjusted first cutting position is not zero, the adjusted first cutting position is adjusted to be a position where the closest energy value is zero.

In the embodiment, under the condition that the change characteristics of the short-time zero crossing rate of the sub-audio files do not meet the change from large to small and do not meet the change from small to large, the pre-marked cutting position can be adjusted based on user operation, so that the pre-marked cutting position of each audio file in the audio file set can be adjusted, and the accuracy of marking the cutting position of the audio file can be improved.

Optionally, the step 101, that is, the cutting position of each audio file in the pre-labeled audio file set, may include:

respectively carrying out voice recognition on each audio file in the audio file set so as to convert each audio file in the audio file set into a text, and pre-marking the cutting position of each audio file according to the text converted by each audio file;

or

And receiving the pre-marking of the cutting positions of the audio files in the audio file set by the user.

In an embodiment, a text corresponding to each audio file in the audio file set can be obtained by performing speech recognition on each audio file in the audio file set, and the cutting position of each audio file is pre-labeled based on the text corresponding to each audio file. For example, the text corresponding to each audio file may be segmented, and the cutting position of the audio file corresponding to the text may be pre-labeled at the segmentation position based on the text.

According to the embodiment, the cutting position of each audio file is pre-marked according to the text converted from each audio file, so that the speed is high, and the labor cost can be saved.

In another embodiment, the cutting positions of the audio files in the audio file set can be pre-labeled by a user, so that the segmentation positions of the audio files in the audio file set can be completely and comprehensively labeled.

Optionally, after the pre-labeling of the cutting positions of the audio files in the audio file set, before the adjusting of the cutting positions of the audio files in the audio file set according to the change characteristic of the short-time zero-crossing rate of the audio files in the audio file set, the method may further include:

and adjusting the cutting position of the audio files in the audio file set according to the energy value of the audio files in the audio file set, so that the energy value of the adjusted cutting position is zero.

In practical situations, since the speech signal interval between words is often close, it is difficult to ensure that the energy value at the pre-marked cutting position is zero, for example, referring to fig. 3, the cutting position of the audio file (i.e. the broken line in the figure) is at the peak point of the audio waveform rather than the zero crossing point, which may cause noise after the audio is cut.

The present embodiment may analyze the energy value at each pre-labeled cutting position, and adjust the pre-labeled cutting position so that the energy value at the adjusted cutting position is zero when the energy value at the cutting position is not zero. For example, the segmentation position shown with reference to fig. 3 may be adjusted to a position where its closest energy value is zero, such as the position of the first zero-crossing on its right side. Thus, the situation of noise after the audio file is cut can be ensured to be reduced.

The present embodiment is explained below with reference to examples:

referring to fig. 4, the audio file cutting position processing method provided by this embodiment may include the following steps:

step 401, pre-marking the cutting position of each audio file in the audio file set.

In this step, the cutting position of each audio file in the audio file set may be manually pre-labeled, or the cutting position of each audio file may be pre-labeled based on a text corresponding to the audio file.

Step 402, obtaining an initial and final combination corresponding to the cutting position of each audio file in the audio file set.

In this step, the combination of initials and finals corresponding to the cutting position may be a combination of finals of characters corresponding to the front audio and initials of characters corresponding to the rear audio of the cutting position.

And 403, automatically adjusting the short-time zero crossing rate corresponding to the initial and final combination to meet the cutting position which changes from large to small or from small to large.

In this step, if the short-time zero-crossing rate corresponding to the initial and final combination corresponding to a certain cutting position satisfies the change from large to small or from small to large, the cutting position can be automatically adjusted according to the short-time zero-crossing rate corresponding to the initial and final combination corresponding to the cutting position. Optionally, whether the initial and final combination corresponding to a certain cut position belongs to a preset initial and final combination set or not may be determined, and whether the short-time zero-crossing rate corresponding to the initial and final combination set changes from large to small or from small to large may be determined, where the short-time zero-crossing rate corresponding to each initial and final combination in the initial and final combination set all changes from large to small or from small to large.

In practical application, the total number of the initial and final combinations is 2300, which accords with that the change of the short-time zero-crossing rate is from large to small or from small to large is 1400, namely the cutting positions of the audio corresponding to about 60 percent of texts can be automatically adjusted, and the time for adjusting the cutting positions can be effectively saved.

And step 404, counting the cutting positions which are automatically adjusted, and manually adjusting the rest cutting positions.

In this step, the cutting positions adjusted based on the change characteristics of the short-time zero-crossing rate may be counted, and the cutting positions that have not been adjusted may be adjusted manually.

Optionally, after step 404, a position with an energy value of zero can be automatically found according to the energy value near the cutting position, so as to perform precision repair and effect repair.

In summary, the audio file cutting position processing method provided by the embodiment can not only improve the accuracy of audio file cutting position marking, but also save labor cost.

Referring to fig. 5, fig. 5 is a structural diagram of an audio file cutting position processing apparatus according to an embodiment of the present invention. As shown in fig. 5, the audio file cutting position processing apparatus 500 includes:

the pre-processing module 501 is configured to pre-label a cutting position of each audio file in an audio file set, where the audio file set includes at least two audio files;

a first adjusting module 502, configured to adjust a cutting position of an audio file in the audio file set according to a change characteristic of a short-time zero-crossing rate of the audio file in the audio file set.

Optionally, the first adjusting module is specifically configured to:

if the change characteristic of the short-time zero-crossing rate of the first sub-audio file of the first audio file meets the change from large to small or from small to large, adjusting the first cutting position to be the transition position of the short-time zero-crossing rate change of the first sub-audio file;

the first audio file is any audio file in the audio file set, the first cutting position is any pre-marked cutting position in the first audio file, and the first sub-audio file is an audio file including audio on two sides of the first cutting position in the first audio file.

Optionally, the apparatus further comprises:

the judging module is used for judging whether an initial consonant and final combination corresponding to a first cutting position belongs to a preset initial consonant and final combination set or not before adjusting the first cutting position to be a transition position of the short-time zero-crossing rate change of the first sub-audio file if the change characteristic of the short-time zero-crossing rate of the first sub-audio file of the first audio file meets the change from large to small or from small to large, wherein the initial consonant and final combination corresponding to the first cutting position is the combination of the initial consonant and the final of characters corresponding to the audio frequencies on two sides of the first cutting position, and the audio frequencies on two sides of the first cutting position correspond to one character;

the first determining module is used for determining that the change characteristic of the short-time zero crossing rate of the first sub-audio file of the first audio file meets the change from large to small or the change from small to large if the initial and final combination corresponding to the first cutting position belongs to a preset initial and final combination set;

and the second determining module is used for determining that the change characteristics of the short-time zero crossing rate of the first sub audio file of the first audio file do not meet the change from large to small and do not meet the change from small to large if the initial and final combination corresponding to the first cutting position does not belong to a preset initial and final combination set.

Optionally, the first adjusting module is specifically configured to:

and adjusting the first cutting position to be a first transition position of the short-time zero-crossing rate change of the first sub-audio file, wherein the energy value of the first transition position is zero.

Optionally, the apparatus further comprises:

the receiving module is used for receiving the adjustment operation of a user on the first cutting position if the change characteristics of the short-time zero crossing rate of the first sub-audio file of the first audio file do not meet the change from large to small and do not meet the change from small to large;

and the second adjusting module is used for adjusting the first cutting position according to the adjusting operation.

Optionally, the preprocessing module is specifically configured to:

respectively carrying out voice recognition on each audio file in the audio file set so as to convert each audio file in the audio file set into a text, and pre-marking the cutting position of each audio file according to the text converted by each audio file;

or

And receiving the pre-marking of the cutting positions of the audio files in the audio file set by the user.

Optionally, the apparatus further comprises:

and the third adjusting module is used for adjusting the cutting positions of the audio files in the audio file set according to the energy values of the audio files in the audio file set before the cutting positions of the audio files in the audio file set are adjusted according to the change characteristics of the short-time zero-crossing rate of the audio files in the audio file set after the cutting positions of the audio files in the audio file set are pre-marked, so that the energy values of the adjusted cutting positions are zero.

The audio file cutting position processing apparatus 500 provided in the embodiment of the present invention can implement each process in the foregoing method embodiments, and for avoiding repetition, details are not described here again.

The audio file cutting position processing device 500 of the embodiment of the present invention includes a preprocessing module 501, configured to pre-mark a cutting position of each audio file in an audio file set, where the audio file set includes at least two audio files; a first adjusting module 502, configured to adjust a cutting position of an audio file in the audio file set according to a change characteristic of a short-time zero-crossing rate of the audio file in the audio file set. The cutting position of the pre-marked audio file is adjusted according to the change characteristic of the short-time zero-crossing rate of the audio file, so that the accuracy of marking the cutting position of the audio file can be improved.

Referring to fig. 6, fig. 6 is a block diagram of an audio file cutting position processing apparatus according to another embodiment of the present invention, and as shown in fig. 6, the audio file cutting position processing apparatus 600 includes: a processor 601, a memory 602 and a computer program stored on said memory 602 and executable on said processor, the various components in the audio file cutting position processing device 600 being coupled together by a bus interface 603, said computer program realizing the following steps when executed by said processor 601:

pre-labeling the cutting position of each audio file in an audio file set, wherein the audio file set comprises at least two audio files;

and adjusting the cutting position of the audio files in the audio file set according to the change characteristics of the short-time zero crossing rate of the audio files in the audio file set.

Optionally, the computer program, when executed by the processor 601, is further configured to:

if the change characteristic of the short-time zero-crossing rate of the first sub-audio file of the first audio file meets the change from large to small or from small to large, adjusting the first cutting position to be the transition position of the short-time zero-crossing rate change of the first sub-audio file;

the first audio file is any audio file in the audio file set, the first cutting position is any pre-marked cutting position in the first audio file, and the first sub-audio file is an audio file including audio on two sides of the first cutting position in the first audio file.

Optionally, the computer program, when executed by the processor 601, is further configured to:

if the change characteristic of the short-time zero-crossing rate of the first sub-audio file of the first audio file meets the change from large to small or from small to large, adjusting a first cutting position to be a transition position of the short-time zero-crossing rate change of the first sub-audio file, and before the transition position, judging whether the initial and final combination corresponding to the first cutting position belongs to a preset initial and final combination set, wherein the initial and final combination corresponding to the first cutting position is the combination of the initial and the final of the characters corresponding to the audio frequencies on two sides of the first cutting position, and the audio frequencies on two sides of the first cutting position correspond to one character;

if the initial and final combination corresponding to the first cutting position belongs to a preset initial and final combination set, determining that the change characteristic of the short-time zero crossing rate of the first sub-audio file of the first audio file meets the change from large to small or the change from small to large;

and if the initial and final combination corresponding to the first cutting position does not belong to a preset initial and final combination set, determining that the change characteristics of the short-time zero crossing rate of the first sub-audio file of the first audio file do not meet the change from large to small and do not meet the change from small to large.

Optionally, the computer program, when executed by the processor 601, is further configured to:

and adjusting the first cutting position to be a first transition position of the short-time zero-crossing rate change of the first sub-audio file, wherein the energy value of the first transition position is zero.

Optionally, the computer program, when executed by the processor 601, is further configured to:

if the change characteristics of the short-time zero crossing rate of the first sub-audio file of the first audio file do not meet the change from large to small and do not meet the change from small to large, receiving the adjustment operation of a user on the first cutting position;

and adjusting the first cutting position according to the adjusting operation.

Optionally, the computer program, when executed by the processor 601, is further configured to:

respectively carrying out voice recognition on each audio file in the audio file set so as to convert each audio file in the audio file set into a text, and pre-marking the cutting position of each audio file according to the text converted by each audio file;

or

And receiving the pre-marking of the cutting positions of the audio files in the audio file set by the user.

Optionally, the computer program, when executed by the processor 601, is further configured to:

after the cutting positions of the audio files in the audio file set are pre-labeled, before the cutting positions of the audio files in the audio file set are adjusted according to the change characteristics of the short-time zero crossing rate of the audio files in the audio file set, the cutting positions of the audio files in the audio file set are adjusted according to the energy values of the audio files in the audio file set, so that the energy values of the adjusted cutting positions are zero.

The embodiment of the present invention further provides an audio file cutting position processing apparatus, which includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, where the computer program, when executed by the processor, implements each process of the above audio file cutting position processing method embodiment, and can achieve the same technical effect, and is not described herein again to avoid repetition.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned method for processing a cutting position of an audio file, and can achieve the same technical effect, and in order to avoid repetition, the computer program is not described herein again. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:语音信号的处理方法及系统、音频处理芯片、电子设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!