Audio processing method and device, mobile terminal and storage medium

文档序号：1506823 发布日期：2020-02-07 浏览：4次中文

阅读说明：本技术 音频处理方法、装置及移动终端及存储介质 (Audio processing method and device, mobile terminal and storage medium ) 是由李�浩陈翔宇于 2018-07-27 设计创作，主要内容包括：本公开提供一种音频处理方法,所述方法包括以下步骤：获取采集模块采集的第一音频数据；对所述第一音频数据通过并发处理方式进行第一音效处理生成第二音频数据；将所述第二音频数据与用于伴奏的第三音频数据进行混音处理生成第一混音数据,所述第三音频数据为预先存储的音频数据；输出所述第一混音数据。本公开能够将音频的延时控制在一个很低的范围内,以便于在最终直播时可以将音频数据与高质量的伴奏进行混音,提高直播k歌的质量。(The present disclosure provides an audio processing method, the method comprising the steps of: acquiring first audio data acquired by an acquisition module; carrying out first sound effect processing on the first audio data in a concurrent processing mode to generate second audio data; mixing the second audio data with third audio data used for accompanying to generate first mixed audio data, wherein the third audio data is pre-stored audio data; and outputting the first mixed voice data. This openly can be with the time delay control of audio frequency in a very low within range to can carry out the audio mixing with high-quality accompaniment with audio data when finally live, improve the quality of live k song.)

1. An audio processing method, comprising the steps of:

acquiring first audio data acquired by an acquisition module;

carrying out first sound effect processing on the first audio data in a concurrent processing mode to generate second audio data;

performing sound mixing processing on the second audio data and third audio data to generate first sound mixing data, wherein the third audio data is pre-stored audio data;

and outputting the first mixed voice data.

2. The method of claim 1, wherein after acquiring the first audio data acquired by the acquisition module, further comprising:

and performing second sound effect processing on the first audio data to generate fourth audio data.

3. The method of claim 2, wherein the second sound effect processing the first audio data to generate fourth audio data comprises:

and processing the first audio data according to a preset sound effect algorithm.

4. The method of claim 2, wherein after the performing the second sound effect processing on the first audio data to generate fourth audio data, further comprises:

and performing sound mixing processing on the fourth audio data and the third audio data to generate second sound mixing data.

5. The method of claim 4, wherein after the mixing the fourth audio data with the third audio data to generate second mixed data, the method further comprises:

and pushing the second audio mixing data to a distribution server through a streaming media module so that the distribution server distributes the second audio mixing data to each audio receiving output side terminal.

6. The method of claim 4, wherein mixing the fourth audio data with the third audio data to generate second mixed data comprises:

time-proofreading the fourth audio data and the third audio data to time-synchronize the fourth audio data and the third audio data.

7. The method of claim 1, wherein the outputting the first mixing data comprises:

and sending the first mixed sound data to a peripheral sound device of the terminal for receiving the audio backtransmission party.

8. An audio processing apparatus, comprising:

an acquisition module configured to acquire first audio data;

the processing module is configured to perform first sound effect processing on the first audio data in a concurrent processing mode to generate second audio data;

the audio mixing module is configured to perform audio mixing processing on the second audio data and third audio data to generate first audio mixing data, wherein the third audio data is pre-stored audio data;

an output module configured to output the first remix data.

9. A mobile terminal, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: the steps of performing the audio processing method of any of claims 1 to 7.

10. A non-transitory computer readable storage medium, wherein instructions in the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform an audio processing method, the method comprising the steps of the audio processing method of any one of claims 1-7.

Technical Field

The present disclosure relates to the field of information processing, and in particular, to an audio processing method and apparatus, a mobile terminal, and a storage medium.

Background

The rapid development of the internet gradually changes the living style of the contemporary people, the requirements of the contemporary people on the spiritual culture are higher and higher, and singing gradually becomes one of the favorite entertainment activities of people. Especially, various Karaoke software products are popularized, so that more and more people can sing or record the singing voice of the people at any time and any place. The Karaoke software product is characterized in that singing voice of a user is synthesized in accompaniment provided by software, and then the singing voice recording result with better quality is obtained through karaoke audio effect processing and editing.

At present, although karaoke scenes are well-developed in the technical implementation of record-following, real-time rendering sound effects in live broadcasting is lacked. In the prior art, an AVCaptureSession with an iPhone advanced package is generally used for acquiring video and audio data in parallel, the buffer for audio acquisition is large, the output frequency is low, and the time axis is not easy to align when background music is processed.

Therefore, in the prior art, the time delay is large and uncontrollable, the function of the karaoke is limited, the problems that vocal accompaniment cannot be aligned, the quality of accompaniment is poor and the like easily occur, and the quality of live karaoke is influenced.

Disclosure of Invention

In order to solve the problems in the related art, the present disclosure provides an audio processing method and apparatus, and a corresponding mobile terminal, which can control the delay of audio within a very low range, so as to mix audio data with high-quality accompaniment in the final live broadcast, thereby improving the quality of live broadcast of a karaoke song.

In order to achieve the purpose, the technical scheme adopted by the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, the present disclosure provides an audio processing method, including the steps of:

acquiring first audio data acquired by an acquisition module;

carrying out first sound effect processing on the first audio data in a concurrent processing mode to generate second audio data;

mixing the second audio data with third audio data used for accompanying to generate first mixed audio data, wherein the third audio data is pre-stored audio data;

and outputting the first mixed voice data.

Specifically, the processing of the first audio data by a concurrent processing mode to generate second audio data includes:

and performing noise reduction, reverberation, equalization and sound change processing on the first audio data.

Specifically, after the first audio data acquired by the acquisition module is acquired, the method further includes:

and performing second sound effect processing on the first audio data to generate fourth audio data.

Specifically, the performing of the second sound effect processing on the first audio data to generate fourth audio data includes:

and processing the first audio data according to a preset sound effect algorithm.

Specifically, after performing the second sound effect processing on the first audio data to generate fourth audio data, the method further includes:

and performing sound mixing processing on the fourth audio data and the third audio data to generate second sound mixing data.

Specifically, after the mixing the fourth audio data and the third audio data to generate second mixed data, the method further includes:

Optionally, mixing the fourth audio data with the third audio data to generate second mixed data, including:

time-proofreading the fourth audio data and the third audio data to time-synchronize the fourth audio data and the third audio data.

Specifically, the outputting the first mixed sound data includes:

and sending the first mixed sound data to a peripheral sound device of the terminal for receiving the audio backtransmission party.

Specifically, the third audio data is third audio data.

Specifically, the acquisition module is configured to acquire and receive the first audio data input by the peripheral input device of the audio return terminal.

Specifically, the peripheral input device of the audio receiving and returning party terminal includes a microphone, an earphone microphone, and a main sound card.

According to a second aspect of embodiments of the present disclosure, there is provided an audio processing apparatus comprising:

an acquisition module configured to acquire first audio data;

the processing module is configured to perform first sound effect processing on the first audio data in a concurrent processing mode to generate second audio data;

an output module configured to output the first remix data.

Specifically, the processing module includes:

a first processing unit configured to perform noise reduction, reverberation, equalization, and voicing processing on the first audio data.

Specifically, still include:

and the second sound effect processing module is configured to perform second sound effect processing on the first audio data to generate fourth audio data.

Specifically, the second sound effect processing module includes:

and the second processing unit is configured to process the first audio data according to a preset sound effect algorithm.

Specifically, still include:

and the mixing unit is configured to perform mixing processing on the fourth audio data and the third audio data to generate second mixed data.

Specifically, still include:

and the sending unit is configured to push the second mixed voice data to a distribution server through a streaming media module, so that the distribution server distributes the second mixed voice data to each receiving audio output terminal.

Specifically, the mixing unit includes:

a collation unit configured to time-collate the fourth audio data with the third audio data to time-synchronize the fourth audio data with the third audio data.

Optionally, the output module includes:

an output unit configured to transmit the first mixed sound data to a peripheral sound device of a receiving audio backhauling side terminal.

Specifically, the acquisition module is configured to acquire and receive the first audio data input by the peripheral input device of the audio return terminal.

Specifically, the peripheral input device of the audio receiving and returning party terminal includes a microphone, an earphone microphone, and a main sound card.

According to a third aspect of the embodiments of the present disclosure, there is provided a mobile terminal including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: the steps of the audio processing method according to any of the first aspect are performed.

According to a fourth aspect of embodiments of the present disclosure, the present disclosure provides a non-transitory computer-readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform an audio processing method, the method comprising the steps of the audio processing method of any one of the first aspects.

According to a fifth aspect of embodiments of the present disclosure, there is provided an application program, which when executed by a processor of a mobile terminal, enables the mobile terminal to execute an audio processing method, the method comprising the steps of the audio processing method according to any one of the claims.

The present disclosure has the following advantages:

the audio processing method comprises the steps of acquiring first audio data acquired by an acquisition module; carrying out first sound effect processing on the first audio data in a concurrent processing mode to generate second audio data; performing sound mixing processing on the second audio data and the third audio data to generate first sound mixing data; and outputting the first mixed voice data. According to the method and the device, the delay control of the audio is controlled in a very low range through a processing mode of concurrently processing the audio data, so that the second audio data and high-quality accompaniment are mixed conveniently during final live broadcasting, the quality of live broadcasting Karaoke is improved, and the Karaoke effect is improved.

2, the first aspect of the disclosure improves the acquisition mechanism of the acquisition module, and can control the buffer acquired by the audio of the acquisition module in a very small range to control the time delay; in the second aspect, the first audio data and the second audio data are processed in real time, so that the data processing time is shortened, the data processing efficiency is improved, and the audio time delay is controlled finally; in a third aspect, the present disclosure controls latency by concurrently processing audio data. This is disclosed can be with audio frequency time delay control in a very low within range through above various measures, realize the alignment of people's voice and accompaniment and will make up the accompaniment sound mixing of quality when being convenient for follow-up mixing sound to promote K song effect.

3, this disclosure can also be right first audio data carry on after the audio effect is handled again with third audio data carry out the audio mixing in order to obtain second audio mixing data to when carrying out the audio mixing and handling, accomplish the alignment of adult's sound and accompaniment, in order to reach the audio frequency treatment effect of optimality.

To sum up, this disclosure obtains lower time delay and better K sings effect under the prerequisite of guaranteeing real-time, multi-functional and high quality audio experience, promotes user experience.

It is to be understood that the foregoing description is of the advantages of the present disclosure and that further description of the advantages will be apparent to those skilled in the art from the following description of the embodiments, and that many more advantages of the present disclosure will be apparent to those skilled in the art from the disclosure herein.

Additional aspects and advantages of the disclosure will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the disclosure.

Drawings

The foregoing and/or additional aspects and advantages of the present disclosure will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow diagram illustrating a method of audio processing according to an exemplary embodiment;

FIG. 2 is an architecture diagram illustrating an iOS audio system stack, according to an exemplary embodiment;

FIG. 3 is a flow diagram illustrating an audio processing device according to an exemplary embodiment;

FIG. 4 is a schematic diagram illustrating the structure of an audio processing device according to an exemplary embodiment;

fig. 5 is a block diagram illustrating a mobile terminal according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Fig. 1 is a flowchart illustrating an audio processing method, which is used in a mobile terminal, as shown in fig. 1, according to an exemplary embodiment, and includes the following steps.

In step S11: and acquiring first audio data acquired by the acquisition module.

In the embodiment of the disclosure, the acquisition module is configured to acquire and receive the first audio data input by the peripheral input device of the audio return terminal. The peripheral input equipment for receiving the audio and returning the audio to the terminal comprises a microphone, an earphone and a main broadcasting sound card. The terminal for receiving the audio back-transmitting party can be a main broadcasting terminal.

The method utilizes the Audio Unit technology, integrates the advantages of the Audio Unit during recording on the premise of guaranteeing timeliness, performance and experience, and is applied to a live broadcast scene to achieve the optimal audio processing effect.

Specifically, the AudioUnit scheme used in the present disclosure is closer to the bottom layer of the iOS Audio system stack, and is used in cooperation with multiple components such as an AudioUnit Processing Graph Services management IO device (IO Unit), a mixer (Mix Unit), an effector (Effect Unit), and the like, to obtain a lower delay and a better Effect.

Referring to fig. 2, fig. 2 is an architecture diagram of an iOS audio system stack, shown in accordance with an exemplary embodiment. As shown in FIG. 2, the iOS audio system stack includes Low-Level, Mid-Level, and High-Level.

Wherein, (1) Low-Level is mainly used in the audio APP implementation on MAC and under the condition that maximum real-time performance is required, most of the audio APPs do not need to use the services of this layer. Moreover, a high-level API with high real-time performance is provided on iOS to meet your demand. E.g., OpenAL, has real-time audio processing capability in games that is invoked directly with I/O.

And the I/O Kit is responsible for interacting with the hardware driver.

The Audio HAL, which is an Audio hardware abstraction layer, separates API calls from the actual hardware and remains independent.

Core MIDI, which provides a software abstraction layer of work for MIDI streams and devices.

And the Host Time Services is responsible for accessing the hardware clock of the computer.

(2) The Mid-Level layer has complete functions, including audio data format conversion, audio file reading and writing, audio stream analysis, plug-in work support and the like.

The Audio Convert Services are responsible for the conversion of Audio data formats.

The Audio File Services are responsible for reading and writing Audio data.

Audio Unit Services and Audio Processing Graph Services are plug-ins that support digital signal Processing such as equalizers and mixers.

Audio File Screen Services are responsible for stream parsing.

Core Audio Clock Services are responsible for Audio Clock synchronization.

(3) High-Level is a group of High-Level applications combined from low-Level interfaces, and basically much work on audio development can be done at this Level.

Audio Queue Services provide recording, playing, pausing, looping, and synchronizing Audio that automatically processes compressed Audio formats with the necessary codecs.

The AVAudio Player is an Objective-C interface-based audio playing class specially provided for the IOS platform, and can support the playing of all audio supported by the iOS.

Extended Audio File Services are composed of Audio files and Audio converters, and provide read-write capability for compressed and uncompressed Audio files.

OpenAL is the implementation of the CoreAudio to the OpenAL standard, and can play a 3D sound mixing effect.

The AudioUnit is a bottom layer technology of an iOS audio system stack, and the iOS provides audio processing plug-ins such as audio mixing, equalization, format conversion, real-time IO recording, playback, offline rendering, voice over internet protocol (VoIP), and the like, which all belong to different audiounits, and support dynamic loading and use. Audiounits can be created and used individually, but are more often used in combination in AudioProcessing Graph containers to meet diverse processing needs.

Based on the principle, the advantages of the Audio Unit during recording are integrated, and the Audio Unit is applied to a live scene to achieve the optimal audio processing effect.

In step S12: and carrying out first sound effect processing on the first audio data in a concurrent processing mode to generate second audio data.

In the embodiment of the present disclosure, it is right that the first audio data is subjected to a concurrent processing mode through a first audio processing module to perform a first audio processing, the audio processing specifically includes performing noise reduction, reverberation, equalization, and sound variation processing on the first audio data, and the first audio processing module has a higher requirement on time delay.

It should be noted that, in the present disclosure, the first audio data is preferably processed in a concurrent processing manner, so as to shorten the data processing time, improve the data processing efficiency, and finally realize controlling the ear return delay.

In addition, the audio time delay is controlled by the following measures:

in a first aspect, the present disclosure improves the acquisition mechanism of the acquisition module, so that the buffer of audio acquisition of the acquisition module can be controlled in a very small range to control the time delay. This is disclosed solves prior art through this mechanism, and the buffer memory of audio acquisition is great, and the output frequency is lower, is difficult to the problem of alignment time axis when handling background music.

In a second aspect, the present disclosure processes the first audio data and the second audio data in real time to shorten data processing time, improve data processing efficiency, and finally control audio time delay.

Further, according to the present disclosure, after the first audio data is acquired, second sound effect processing is further performed on the first audio data to generate fourth audio data. Specifically, the first audio data is processed according to a preset sound effect algorithm. And the fourth audio data is used for outputting the audio data to a terminal of an audio receiving output party after being mixed with the third audio data subsequently, such as a terminal of each listener, so that each listener can hear the audio of the main broadcasting.

In step S13: and mixing the second audio data and the audio data used for accompaniment to generate first mixed sound data.

In the embodiment of the present disclosure, the second audio data and the third audio data are input to a first audio mixing module to mix the second audio data and the third audio data through the first audio mixing module, and the first audio mixing data is output to a peripheral sound device that receives the audio returning party terminal. The peripheral sound device of the receiving audio back-transmitting terminal includes but is not limited to a speaker, an earphone, and the like.

Furthermore, this disclosure is right after first audio data carries out second audio treatment and generates fourth audio data, it is right fourth audio data with third audio data carries out the audio mixing and generates second audio mixing data, it is right simultaneously fourth audio data with the completion will when third audio data carries out the audio mixing fourth audio data with third audio data carries out the time proofreading, so that fourth audio data with the time synchronization of third audio data to accomplish the alignment of people's voice and accompaniment when realizing the audio mixing, promote the audio treatment effect. The second audio mixing data is used for being pushed to a distribution server through a streaming media module, so that the distribution server distributes the second audio mixing data to each audio receiving side terminal.

This it is right that this a set of audio processing algorithm of preferred its self handles first audio data to reach individualized processing audio data's purpose, and through the better realization of control time delay fourth audio data with the time proofreading of third audio data accomplishes the alignment of voice and accompaniment, reach better K sings the effect, promote user experience.

In step S14: and outputting the first mixed voice data.

In the embodiment of the present disclosure, two sets of processing methods are performed on the input first mixed sound data to finally obtain two output results, that is, the first mixed sound data and the second mixed sound data. And the first audio mixing data is correspondingly output to the peripheral sound equipment of the audio return receiving terminal, so that the anchor can hear the sound of the anchor. And the second mixed sound data is output to a distribution server, so that the distribution server distributes the second mixed sound data to each audio receiving side terminal, and each listener can hear the sound of the main broadcasting.

The present disclosure has performed first sound effect processing and sound mixing on first sound mixing data output to the anchor, has performed second sound effect processing and sound mixing on second sound mixing data output to the listener terminal. Wherein the second sound effect processing is more focused on sound effect processing. In terms of controlling time delay, the present disclosure prefers a concurrent processing method to process the first audio data; in the aspect of sound effect processing, it is right that this disclosure prefers its one set of sound effect processing algorithm of self setting do individualized processing to first audio data to promote the effect of sound effect processing, promoted the effect that the ear of anchor returned on the one hand, on the other hand has promoted listener's experience.

Fig. 3 is a block flow diagram illustrating an audio processing apparatus according to an example embodiment. Referring to fig. 3, the apparatus includes an acquisition module 11, a processing module 12, a mixing module 13, and an output module 14.

The acquisition module 11 is configured to acquire first audio data.

With continuing reference to fig. 2, fig. 2 is an architectural diagram of an iOS audio system stack, shown in accordance with an exemplary embodiment. As shown in FIG. 2, the iOS audio system stack includes Low-Level, Mid-Level, and High-Level.