Audio content generation method, server side equipment and client side equipment

文档序号：1739458 发布日期：2019-12-20 浏览：20次中文

阅读说明：本技术 一种音频内容生成方法、服务端设备和客户端设备 (Audio content generation method, server side equipment and client side equipment ) 是由孙浩华于 2018-06-12 设计创作，主要内容包括：本申请提供了一种音频内容生成方法、服务端设备和客户端设备,其中,该方法包括：获取客户端发送来的多个音乐元素的选择项,其中,所述音乐元素包括以下至少之一；风格、心情、节奏；根据所述多个音乐元素的选择项,确定音色元素内容和伴奏元素内容；合成所述音色元素内容和所述伴奏元素内容,得到音频内容；将所述音频内容传送至客户端。通过上述方案解决了现在的编曲操作对专业性要求过高的问题,达到了简单高效进行编曲的技术效果。(The application provides an audio content generation method, server equipment and client equipment, wherein the method comprises the following steps: acquiring selection items of a plurality of music elements sent by a client, wherein the music elements comprise at least one of the following items; style, mood, rhythm; determining tone element content and accompaniment element content according to the selection items of the plurality of music elements; synthesizing the tone element content and the accompaniment element content to obtain audio content; transmitting the audio content to a client. By the aid of the scheme, the problem that the professional requirement of the existing song editing operation is too high is solved, and the technical effect of simply and efficiently editing songs is achieved.)

1. A method for audio content generation, the method comprising:

acquiring selection items of a plurality of music elements sent by a client, wherein the music elements comprise at least one of the following items; style, mood, rhythm;

determining tone element content and accompaniment element content according to the selection items of the plurality of music elements;

synthesizing the tone element content and the accompaniment element content to obtain audio content;

transmitting the audio content to a client.

2. The method of claim 1, wherein determining timbre element content and accompaniment element content from the plurality of music element selections comprises:

formatting the options of the music elements to obtain a mapping relation between the music elements and the corresponding options;

inquiring a tone library according to a formatting processing result to obtain tone element contents matched with the formatting processing result;

and inquiring an accompaniment library according to the formatting processing result to obtain the accompaniment element content matched with the formatting processing result.

3. The method of claim 2, further comprising:

receiving tone element content and/or accompaniment element content;

and storing the received tone element content into a tone library, and storing the received accompaniment element content into an accompaniment library.

4. The method of claim 1, wherein determining timbre element content and accompaniment element content from the plurality of music element selections comprises:

inputting the selection items of the plurality of music elements into a preset machine learning model;

generating timbre element content and accompaniment element content through the machine learning model.

5. The method of claim 1, wherein the music element further comprises: a human voice;

correspondingly, according to the selection items of the plurality of music elements, determining the tone element content and the accompaniment element content comprises the following steps:

and matching to obtain tone element content, accompaniment element content and the simulated human voice file according to the selection items of the music elements.

6. The method of any of claims 1 to 5, wherein selecting items of a style comprises: balladry, pop, rock, tempo options include: fast tempo, general tempo and slow tempo, options for mood include: easy to worry, warm, and excited.

7. The method according to any one of claims 1 to 5, wherein synthesizing the timbre element content and the accompaniment element content to obtain audio content comprises:

composing the tone element content and the accompaniment element content;

and carrying out audio format conversion on the composition synthesis result to obtain audio content.

8. A method for audio content generation, the method comprising:

providing a display interface, and displaying a plurality of music element selection items, wherein the music elements comprise at least one of the following items; style, mood, rhythm;

receiving a selection result of a user on a selection item of a plurality of music elements;

transmitting the selection result to a server, wherein the server is used for determining tone element content and accompaniment element content according to the selection result, and synthesizing the tone element content and the accompaniment element content to obtain audio content;

and playing the audio content.

9. A method for audio content generation, the method comprising:

acquiring a selection item of a plurality of music elements selected by a user, wherein the music elements comprise at least one of the following items; style, mood, rhythm;

determining tone element content and accompaniment element content according to the selection items of the plurality of music elements;

synthesizing the tone element content and the accompaniment element content to obtain audio content;

and playing the audio content.

10. A server device comprising a processor and a memory for storing processor-executable instructions, the instructions when executed by the processor implementing:

acquiring selection items of a plurality of music elements sent by a client, wherein the music elements comprise at least one of the following items; style, mood, rhythm;

determining tone element content and accompaniment element content according to the selection items of the plurality of music elements;

synthesizing the tone element content and the accompaniment element content to obtain audio content;

transmitting the audio content to a client.

11. The server device of claim 10, wherein determining the timbre element content and the accompaniment element content based on the selection of the plurality of music elements comprises:

formatting the options of the music elements to obtain a mapping relation between the music elements and the corresponding options;

inquiring a tone library according to a formatting processing result to obtain tone element contents matched with the formatting processing result;

and inquiring an accompaniment library according to the formatting processing result to obtain the accompaniment element content matched with the formatting processing result.

12. The server device according to claim 11, further comprising:

receiving tone element content and/or accompaniment element content;

and storing the received tone element content into a tone library, and storing the received accompaniment element content into an accompaniment library.

13. The server device of claim 10, wherein determining the timbre element content and the accompaniment element content based on the selection of the plurality of music elements comprises:

inputting the selection items of the plurality of music elements into a preset machine learning model;

generating timbre element content and accompaniment element content through the machine learning model.

14. The server device of claim 10, wherein the music element further comprises: a human voice;

correspondingly, according to the selection items of the plurality of music elements, determining the tone element content and the accompaniment element content comprises the following steps:

and determining tone element content, accompaniment element content and the simulated human voice file according to the selection items of the plurality of music elements.

15. The server device according to any one of claims 10 to 14, wherein the selection items of the style comprise: balladry, pop, rock, tempo options include: fast tempo, general tempo and slow tempo, options for mood include: easy to worry, warm, and excited.

16. The server device according to any one of claims 10 to 14, wherein the synthesizing of the timbre element content and the accompaniment element content to obtain audio content comprises:

composing the tone element content and the accompaniment element content;

and carrying out audio format conversion on the composition synthesis result to obtain audio content.

17. A client device comprising a processor and a memory for storing processor-executable instructions that when executed by the processor implement:

providing a display interface, and displaying a plurality of music element selection items, wherein the music elements comprise at least one of the following items; style, mood, rhythm;

receiving a selection result of a user on a selection item of a plurality of music elements;

and playing the audio content.

18. A client device comprising a processor and a memory for storing processor-executable instructions that when executed by the processor implement:

acquiring a selection item of a plurality of music elements selected by a user, wherein the music elements comprise at least one of the following items; style, mood, rhythm;

determining tone element content and accompaniment element content according to the selection items of the plurality of music elements;

synthesizing the tone element content and the accompaniment element content to obtain audio content;

and playing the audio content.

19. A computer readable storage medium having stored thereon computer instructions which, when executed, implement the steps of the method of any one of claims 1 to 7.

Technical Field

The present application belongs to the technical field of data processing, and in particular, to an audio content generation method, a server device, and a client device.

Background

Currently, a professional person is generally required to complete the composition work, mainly because the technical operations involved in the composition work are many, for example: melody repetition, transposition, blurring, musical interval or rhythm companding, etc., which need to be understood by professional people. This also results in a difficult composition operation for people who are not very knowledgeable about music.

The existing music software generally only supports the operations of downloading and playing music, and some software providing the music editing function needs to be operated by special professional persons, so that the barrier of music editing is higher, namely the difficulty of music editing is higher.

Aiming at the problem of higher difficulty in the existing song editing, an effective solution is not provided at present.

Disclosure of Invention

The application aims to provide an audio content generation method, a server device and a client device, which can realize the effect of simply and efficiently generating audio content.

The application provides an audio content generation method, a server device and a client device, which are realized as follows:

a method of audio content generation, the method comprising:

acquiring selection items of a plurality of music elements sent by a client, wherein the music elements comprise at least one of the following items; style, mood, rhythm;

determining tone element content and accompaniment element content according to the selection items of the plurality of music elements;

synthesizing the tone element content and the accompaniment element content to obtain audio content;

transmitting the audio content to a client.

A method of audio content generation, the method comprising:

providing a display interface, and displaying a plurality of music element selection items, wherein the music elements comprise at least one of the following items; style, mood, rhythm;

receiving a selection result of a user on a selection item of a plurality of music elements;

and playing the audio content.

A method of audio content generation, the method comprising:

acquiring a selection item of a plurality of music elements selected by a user, wherein the music elements comprise at least one of the following items; style, mood, rhythm;

determining tone element content and accompaniment element content according to the selection items of the plurality of music elements;

synthesizing the tone element content and the accompaniment element content to obtain audio content;

and playing the audio content.

A server device comprising a processor and a memory for storing processor-executable instructions, the instructions when executed by the processor implementing:

acquiring selection items of a plurality of music elements sent by a client, wherein the music elements comprise at least one of the following items; style, mood, rhythm;

determining tone element content and accompaniment element content according to the selection items of the plurality of music elements;

synthesizing the tone element content and the accompaniment element content to obtain audio content;

transmitting the audio content to a client.

A client device comprising a processor and a memory for storing processor-executable instructions that when executed by the processor implement:

providing a display interface, and displaying a plurality of music element selection items, wherein the music elements comprise at least one of the following items; style, mood, rhythm;

receiving a selection result of a user on a selection item of a plurality of music elements;

and playing the audio content.

A client device comprising a processor and a memory for storing processor-executable instructions that when executed by the processor implement:

acquiring a selection item of a plurality of music elements selected by a user, wherein the music elements comprise at least one of the following items; style, mood, rhythm;

determining tone element content and accompaniment element content according to the selection items of the plurality of music elements;

synthesizing the tone element content and the accompaniment element content to obtain audio content;

and playing the audio content.

A computer readable storage medium having stored thereon computer instructions which, when executed, implement the steps of the above-described method.

According to the audio content generation method, the server side equipment and the client side equipment, the user only needs to select the music element selection items on the client side equipment, the background can finish song editing, namely, the audio content can be generated, and song editing can be finished without too much song editing knowledge of the user, so that the problem that the professional requirement of the existing song editing operation is too high is solved, and the technical effect of simply and efficiently editing songs is achieved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a schematic diagram of the architecture of a composition system provided herein;

FIG. 2 is an interaction flow diagram of the composition process provided herein;

FIG. 3 is a schematic illustration of musical composition generation provided by the present application;

FIG. 4 is a schematic interface diagram of a compilation file generation result presentation provided by the present application;

FIG. 5 is a schematic interface diagram of a composition element selection provided herein;

FIG. 6 is a method flow diagram of an audio content generation method provided herein;

FIG. 7 is an architecture diagram of a server device provided herein;

fig. 8 is a block diagram of an audio content generating apparatus according to the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Considering the problem that the existing composition process is too complex due to too high professional requirement, if the automatic composition is performed in an ai (artificial intelligence) artificial intelligence mode, the composition can be realized by people without composition specialty, so that the creation and generation of musical compositions can be realized simply.

To this end, in the present example, there is provided a composition system, as shown in fig. 1, comprising: the system comprises client equipment 101 and server equipment 102, wherein the client equipment 101 is used for providing a composition interface for a user, so that the user can input and select the composition interface, and a composition result is displayed for the user. The server device 102 is configured to obtain information input or selected by the user, perform automatic composition based on the information input or selected by the user, and send a composition result to the client device 101.

In practical implementation, part of the composition may be performed on the server side, i.e. the architecture shown in fig. 1, or all operations may be performed in the client device. The selection can be specifically made according to the processing capability of the device, the limitation of the use scene of the user and the like. For example, the user may compose online or offline. This is not a limitation of the present application.

The client device 101 may be a mobile phone, a tablet computer, a wearable device, a desktop computer, a all-in-one machine, or an APP composed by a user, for example: composition software, music software, movie software, etc., the specific form of the client device 101 is not limited in this application.

To make the process of user composition more flexible, the client device 101 may provide different composition input interfaces for the user to select. For example: the interface for visual selection can be provided for the user, and the input interface for the composition element can also be provided for the user. After the user opens the software with the editing function, the type of the input interface can be selected according to the preference or habit of the user.

As shown in fig. 2, a plurality of composition elements may be displayed on the interface: style, mood, rhythm, etc., and then the user triggers the different composition elements to select a detailed selection of a particular composition element. For example, a user clicks on a genre, displaying a number of genre choices: balladry, popular, rock. The user clicks on the tempo, displaying a plurality of tempo choices: fast tempo, general tempo and slow tempo. The user clicks mood, displaying a plurality of mood options: anxiety, uneasiness and mild. And the user clicks the style selection item, the rhythm selection item and the mood selection item to generate a setting result of the user on the composition element.

The client device sends the composition elements set by the user to the server, and the server can format the composition elements set by the user, for example, a format input list can be set at the server, and a plurality of composition elements are set in the list and each composition element correspondingly obtains a value, for example: style, mood, rhythm, then setting a field for filling for each composition element, wherein the field is used for filling according to the composition element set by the user, such as: the style selection is popular, then the options for the corresponding style field are filled in as popular. In specific implementation, the fields may be filled in a text manner, or may be filled in by using a relationship between numbers or character strings and options, for example: if the popularity corresponds to a number of 2, then 2 bugs may be put into the genre field, so that the server knows that the genre selected by the user is popular based on 2 in the field.

After the formatted data is obtained, the data can be used as the input of the server-side AI composition model, that is, the input data of the AI composition model is obtained by formatting the composition elements set by the user. For the AI composition model, that is, for the server, matching and calling can be performed in a pre-established tone library and accompaniment library according to input data of the model, the corresponding tone element content and accompaniment element content are called after calling, and then the called tone element content and accompaniment element content are synthesized to obtain a composition file.

However, the above listed composition elements and the plurality of options corresponding to each composition element are only an exemplary illustration, and when the composition elements and the options are actually implemented, other composition elements and options may be provided, which may be set according to actual requirements, and the present application does not limit this.

The timbres (timbres), also called sound products, refer to sound frequencies generated by vibrations of different objects, which are always distinctive characteristics in terms of waveforms. The tone library is an audio sample obtained by playing a plurality of different instruments, wherein the audio sample rate can be 24 bits or more. For example: pure wood guitar tone color library includes: the tone of guitar such as nylon, steel string, 12 string, bass, etc. The tone library stores audio samples for a plurality of instruments. Matching can be carried out according to the composition elements set by the user so as to obtain the tone element content for synthesizing the composition.

The accompaniment is a melody, and the accompaniment library is a melody library, which is an organized and rhythmic sequence formed by combining a plurality of tones, and is performed according to a single sound part which is composed of a certain pitch, a certain duration and a certain volume and has logic factors. Melodies are composed of a number of basic musical elements (e.g., key, rhythm, tempo, dynamics, timbre, etc.).

In the implementation process, after acquiring the tone color element content and the accompaniment element content from the tone color library and the accompaniment library according to the music composing elements set by the user, the server side can synthesize the tone color element content and the accompaniment element content to obtain a music composing file. For the formed composition file, format conversion may be performed to convert into audio contents. The audio content may be an audio file or a media stream content, and which form is specifically adopted for presentation may be selected according to actual needs, which is not limited in the present application.

Because of the arrangement of the accompaniment library and the tone library, the server end can select the accompaniment element content and the tone element content from the accompaniment library and the tone library to synthesize the final song editing file, so that the repetition, the transposition, the blur, the interval or the rhythm are expanded, the harmony and the pitch in the alignment are longitudinally and transversely arranged and combined, the tone combination in the distributor is combined, the parallel, the opposite, the convolution, the playing and the like in the song form do not need to be set and selected by users, the music and the tone are all in the files of the accompaniment library and the tone library, only the tone element content and the accompaniment element content need to be matched to be synthesized, and thus, non-professional people can also edit the song.

The accompaniment element content and the timbre element content in the accompaniment library and the timbre library can be updated in real time, for example, background personnel can arrange a batch of accompaniment element content and timbre element content to be added into the accompaniment library and the timbre library, a mode that a user uploads the accompaniment library and the timbre library by himself can be provided, and the resources in the timbre library and the accompaniment library are rich.

As for the accompaniment element content and the tone element content, the style, mood and rhythm corresponding to each accompaniment element content and the style, mood and rhythm corresponding to each tone element content may be set in advance. Thus, for the files in the accompaniment library and the tone library, clustering can be performed, for example: which accompaniment element content is happy, fast paced, rock, which accompaniment element content is sad, slow paced, electronic music; which tone element content is happy, fast paced, rock, which tone element content is sad, slow paced, electronic music. Therefore, when the files are matched, the corresponding accompaniment element content set and the tone element content set can be directly matched according to the input conditions. Then, accompaniment element content or timbre element content may be randomly selected from the corresponding set of files. In which the files in the sound and accompaniment libraries may be increasing, so that the final results become more and more diversified.

In the above example, the explanation is given taking an example in which matching is performed from a matching library such as an accompaniment library or a timbre library to specify timbre element contents, accompaniment element contents, and the like. However, in practical implementation, the tone element content and the accompaniment element content may be determined in other manners. For example, the calculation can be performed by machine learning. For example, the selection items of a plurality of music elements (for example, the specific style, mood and rhythm selected by the user) sent by the client can be acquired, then the selection items are input into a preset machine model as input data of the model, and the tone element content, the accompaniment element content and the like are determined through the machine model. The tone element content and accompaniment element content obtained by the machine model may be obtained from a database, or may be synthesized by the machine model through input data, and specifically, which method may be selected based on actual processing capability, etc., and the present application does not limit this.

For the generated audio content, the user may perform operations such as listening, storing, forwarding, and the like, and specifically which kind of subsequent operation is adopted, which is not limited in the present application.

Because the song composition mode of the application is to provide a simple song composition method for the user so as to provide music inspiration for the user, the song composition file synthesized once is not necessarily favorite of the user, and the user can perform song composition for many times to obtain different song composition files or perform song composition until obtaining favorite song composition files.

In the above-described timbre library and accompaniment library, each of the composition elements and the options corresponding to each of the composition elements correspond to a plurality of timbre element contents and accompaniment element contents. Under the condition that the composing elements selected by the user are the same, the sound color element content and the accompaniment element content which are finally matched are different in probability. Even if the same composition element is set twice by the user, the finally obtained composition files are different with a high probability. Because the files in the database are diversified for the database of sound and accompaniment, the matched files may be different each time. For example, the selected entries are "popular, happy, fast-paced", the first time the three entries are selected, a first composition result is generated, the second time the three entries are also selected, a second composition result is generated, and the first composition result and the second composition result have different probabilities. Because of the popularity, happiness and fast tempo as well, multiple timbre element content can be matched in the timbre library and multiple accompaniment element content can be matched in the accompaniment library. This results in the output being different each time the same entry is made.

For this reason, when implemented, after the user inputs one selection item, the user may generate a first composition result, and after listening to the first composition result, the user may click "re-compose based on the existing conditions", and then re-compose based on the current selection item, to obtain a second composition result. If the user wishes to re-input the selection item, after the audition is completed, click on're-select composition element', and then trigger a completely new flow of selecting composition element and generating composition result file.

In one embodiment, considering that sometimes the composition needs to be word-filled and sometimes does not need to be word-filled, before composition, a selection item can be set for the user, the user selects whether the word-filling is needed, and if the word-filling is selected, the user can upload the lyric file. Then, when the composing element is selected, a composing element of a human voice can be set for the user to select. For example, the user may select: male medians, male bass, female treble, etc.

When the song composition file is generated, the song composition result, the lyric content and the simulated voice obtained by matching can be fused to obtain the final song file with the lyrics. In order to match the simulated voice, a simulated voice library can be set, which is a parallel concept with the above-mentioned tone library and accompaniment library, and is a database that the AI composition model needs to match during the composition process: based on the composing elements selected by the user, the timbre element contents are obtained from the timbre library in a matching mode, the accompaniment element contents are obtained from the accompaniment library in a matching mode, and the simulated human voice files are obtained from the simulated human voice library in a matching mode.

In order to make the user more interesting in the composition process and improve the user experience, scene content can be mapped for each composition element. For example, if music festivals are shown in some places, the composer elements may be identified by buildings or special gourmet foods in the places. For example, the style of the corresponding building is marked (garden corresponding to popular, temple corresponding to balladry, etc.); the food is corresponding to the rhythm (soup dumplings are corresponding to the fast rhythm, cakes are corresponding to the slow rhythm, etc.).

However, it should be noted that the above listed mapping of the composition element by other symbolic contents is only an exemplary description, and in practical implementation, different mapping relationships may be set according to practical application scenarios, and the application is not limited to this.

As shown in fig. 3, the whole composing process may include: condition input, condition analysis, composition, audio synthesis, music generation and secondary processing. Wherein, the condition input can be user selection style, mood, rhythm, voice, etc.; the condition analysis can be the analysis of the condition input by the user, and the accompaniment element content, the tone element content and the simulated human voice file are obtained by matching from the accompaniment library, the tone library and the simulated human voice library based on the AI composition model; the audio synthesis can be to perform audio synthesis on the accompaniment element content, the tone element content and the simulated human voice file obtained by matching to obtain audio content; the music generation and the secondary processing may be processing the audio content, for example: after the processing such as noise reduction and optimization, as shown in fig. 4, a final composition result file or music result file is obtained.

The above composition method is described below with reference to a specific scenario example, however, it should be noted that this specific embodiment is only to better illustrate the present application and should not be construed as a limitation to the present application.

As shown in fig. 5, the front-end device, that is, the client device displays a composition interface, and after the user opens the corresponding composition software, the user can select through the interface shown in fig. 5 by clicking operation: style (e.g., pop, rock, ballad, electronic), mood (e.g., happy, excited, relaxed, warm, wounded), rhythm (e.g., fast, medium, slow). When setting the tempo, the tempo can be set by setting the BPM value. The BPM (beat Minute), i.e., the unit of beats per Minute, is the number of sound beats made between time segments of one Minute, and the unit of this number is the BPM.

For the user, only the selection operation as shown in fig. 4 needs to be performed, and not much knowledge of the music common sense, such as: melody repetition, transposition, blurring, musical interval or rhythm companding; harmony and pitch in contraposition are longitudinally and transversely arranged and combined; tone color combination in the adapter; the paralleling, opposition, backspin, humming, etc. in the melody, all without user manipulation, are integrated into the timbre element content and the accompaniment element content. By the method, the user can edit music at any time and any place through mobile equipment such as a mobile phone, so that the music content is created and generated.

In order to improve the enjoyment of music creation and improve user experience, the style, rhythm, mood and the like in the AI composition and the selection conditions corresponding to the subordinate levels thereof can be mapped with a specific scene or a specific real object, that is, the data of the actual server interface is ensured to be unchanged, and the selection conditions can be set through other deformable and interesting selection interfaces at the front section of the user of the client device. For example: in a certain music festival exhibition in west 'an, "genre" may correspond to a "landmark building" in west' an notability, such as: "popular" corresponds to "clockhouse" and "mood" may correspond to "feature snacks" in west' an, for example: "happy" corresponds to "Chinese hamburger", etc. By the scene mapping relation, the music composing process is more interesting.

When the method is implemented, if the user fills in the lyrics while selecting the composition input item, the corresponding lyrics content can be deduced by simulating human voice, and meanwhile, the lyrics content is fused with the composition result to obtain a complete song, namely, the AI composition is changed into AI writing song. If the user does not fill in the lyrics or the default song item is blank, a composition process is triggered instead of the song writing process.

In the scheme of the above example, the user can complete song editing through simple operation, and the operation process is simple and quick. And this kind of composition method, do not need the user to have specific music literacy, and almost everybody can realize the composition through this way. Meanwhile, for advanced users with music base, the creation process is simpler and the cost is lower.

FIG. 6 is a flow chart of a method of one embodiment of a method of audio content generation as described herein. Although the present application provides method operational steps or apparatus configurations as illustrated in the following examples or figures, more or fewer operational steps or modular units may be included in the methods or apparatus based on conventional or non-inventive efforts. In the case of steps or structures which do not logically have the necessary cause and effect relationship, the execution sequence of the steps or the module structure of the apparatus is not limited to the execution sequence or the module structure described in the embodiments and shown in the drawings of the present application. When the described method or module structure is applied in an actual device or end product, the method or module structure according to the embodiments or shown in the drawings can be executed sequentially or executed in parallel (for example, in a parallel processor or multi-thread processing environment, or even in a distributed processing environment).

Specifically, as shown in fig. 6, an audio content generating method provided in an embodiment of the present application may include:

step 601: acquiring selection items of a plurality of music elements sent by a client, wherein the music elements comprise at least one of the following items; style, mood, rhythm;

wherein, the selection items of the style can include but are not limited to: balladry, pop, rock, tempo choices may include, but are not limited to: fast tempo, general tempo and slow tempo, options for mood may include, but are not limited to: easy to worry, warm, and excited.

However, it should be noted that the above listed music elements and the corresponding options of the respective music elements are only an exemplary description and do not constitute a limitation of the present application. Wherein the music elements correspond to the composition elements above.

Step 602: determining tone element content and accompaniment element content according to the selection items of the plurality of music elements;

considering that data from the client is not easy to process by the server, in order to make the server perform centralized and effective processing on the data, the selection items of the plurality of music elements may be formatted to obtain the mapping relationship between the music elements and the corresponding selection items.

Then, according to the formatting processing result, inquiring a tone library to obtain tone element content matched with the formatting processing result; and inquiring an accompaniment library according to the formatting processing result to obtain the accompaniment element content matched with the formatting processing result.

That is, an accompaniment library and a music library may be provided, in which a plurality of accompaniment element contents and music element contents are stored, and matching may be performed to the music library and the accompaniment library based on a selection item of each music element selected by the user, so as to obtain the matched accompaniment element contents and music element contents.

Step 603: synthesizing the tone element content and the accompaniment element content to obtain audio content;

step 604: transmitting the audio content to a client.

The accompaniment element content and the accompaniment element content in the accompaniment library and the tone library can be continuously rich, namely, can be continuously updated and perfect, therefore, the tone element content and/or the accompaniment element content can be received, then, the received tone element content is stored in the tone library, and the received accompaniment element content is stored in the accompaniment library. So as to make the obtained composing results more diversified.

When generating audio content, not only a composition file but also a music file with lyrics may be generated, that is, the above music elements may also include: human voice, i.e., whether the user can select a male treble, a female midrange, etc. To generate a music file, a user may upload a lyric file, and match the content of the tone element and the content of the accompaniment element according to the selection of the plurality of music elements, which may include: and matching to obtain tone element content, accompaniment element content and the simulated human voice file according to the selection items of the music elements.

The audio content generated after only completing the synthesis operation and not being completely effective, and thus synthesizing the timbre element content and the accompaniment element content to obtain the audio content, may include: composing the tone element content and the accompaniment element content; and carrying out audio format conversion on the composition synthesis result to obtain audio content. That is, a conversion operation is added so that the composition result can be converted into audio contents that can be played.

For a client device, an audio content generation method may include:

s1: providing a display interface, and displaying a plurality of music element selection items, wherein the music elements comprise at least one of the following items; style, mood, rhythm;

s2: receiving a selection result of a user on a selection item of a plurality of music elements;

s3: transmitting the selection result to a server, wherein the server is used for determining tone element content and accompaniment element content according to the selection result, and synthesizing the tone element content and the accompaniment element content to obtain audio content;

s4: and playing the audio content.

That is, it may be the client device that primarily provides the user with a selection interface and receives and plays the generated audio content.

It is considered that if all execution flows are executed on the client side, the following steps may be included:

s1: acquiring a selection item of a plurality of music elements selected by a user, wherein the music elements comprise at least one of the following items; style, mood, rhythm;

s2: determining tone element content and accompaniment element content according to the selection items of the plurality of music elements;

s3: synthesizing the tone element content and the accompaniment element content to obtain audio content;

s4: and playing the audio content.

The method embodiments provided in the present application may be executed in a mobile terminal, a computer terminal, a server terminal or a similar computing device. Taking the example of operating on the server side, fig. 7 is a hardware structure block diagram of a computer terminal of the audio content generating method according to the embodiment of the present invention. As shown in fig. 7, the computer terminal 10 may include one or more (only one shown) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission module 106 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 7 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 7, or have a different configuration than shown in FIG. 7.

The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the audio content generating method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implements the audio content generating method of the application program. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission module 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission module 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission module 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

At a software level, as shown in fig. 8, the audio content generating apparatus may include: an obtaining module 801, a matching module 802, a synthesizing module 803, and a transmitting module 804, wherein:

an obtaining module 801, configured to obtain a selection item of a plurality of music elements sent by a client, where the music elements include at least one of the following; style, mood, rhythm;

a determining module 802, configured to determine, according to the selection items of the plurality of music elements, the content of the timbre element and the content of the accompaniment element;

a synthesizing module 803, configured to synthesize the timbre element content and the accompaniment element content to obtain an audio content;

a transmitting module 804, configured to transmit the audio content to a client.

In one embodiment, the determining module 802 may specifically perform formatting on the options of the multiple music elements to obtain a mapping relationship between the music elements and the corresponding options; inquiring a tone library according to a formatting processing result to obtain tone element contents matched with the formatting processing result; and inquiring an accompaniment library according to the formatting processing result to obtain the accompaniment element content matched with the formatting processing result.

In one embodiment, the audio content generating device may further receive tone element content and/or accompaniment element content; and storing the received tone element content into a tone library, and storing the received accompaniment element content into an accompaniment library.

In one embodiment, the music element may further include: a human voice; correspondingly, according to the multiple music element options, matching to obtain the tone element content and the accompaniment element content, which may include: and matching to obtain tone element content, accompaniment element content and the simulated human voice file according to the selection items of the music elements.

In one embodiment, the style selections may include: balladry, pop, rock, tempo choices may include: fast tempo, general tempo and slow tempo, options for mood may include: easy to worry, warm, and excited.

In one embodiment, the synthesis module 803 may specifically compose and synthesize the tone element content and the accompaniment element content; and carrying out audio format conversion on the composition synthesis result to obtain audio content.

In the above example, an audio content generating method, a server device and a client device are provided, a user only needs to select a music element selection item on the client device, and a background can complete music composition, that is, audio content can be generated, and music composition can be completed without the user having too much music composition knowledge, so that the problem that the professional requirement of the existing music composition operation is too high is solved, and the technical effect of simply and efficiently performing music composition is achieved.

Although the present application provides method steps as described in an embodiment or flowchart, additional or fewer steps may be included based on conventional or non-inventive efforts. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual apparatus or client product executes, it may execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.

The apparatuses or modules illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. The functionality of the modules may be implemented in the same one or more software and/or hardware implementations of the present application. Of course, a module that implements a certain function may be implemented by a plurality of sub-modules or sub-units in combination.

The methods, apparatus or modules described herein may be implemented in computer readable program code to a controller implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, MicrochipPIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

Some of the modules in the apparatus described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary hardware. Based on such understanding, the technical solutions of the present application may be embodied in the form of software products or in the implementation process of data migration, which essentially or partially contributes to the prior art. The computer software product may be stored in a storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, mobile terminal, server, or network device, etc.) to perform the methods described in the various embodiments or portions of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. All or portions of the present application are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, mobile communication terminals, multiprocessor systems, microprocessor-based systems, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

While the present application has been described with examples, those of ordinary skill in the art will appreciate that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.

20页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于音频数据的LED动态频谱显示方法

Audio content generation method, server side equipment and client side equipment

相关技术

网友询问留言