Method and system for generating audio or audio link

文档序号：600197 发布日期：2021-05-04 浏览：24次中文

阅读说明：本技术 音频或音频链接的生成方法及系统 (Method and system for generating audio or audio link ) 是由王国李勇于 2020-12-28 设计创作，主要内容包括：本发明实施例提供一种音频或音频链接的生成方法。该方法包括：将文本进行切割处理,生成多个文本段；将多个文本段按照进栈方式向语音对话平台逐步请求TTS服务；按照出栈方式生成多个音频链接或多个音频文件。本发明实施例还提供一种音频或音频链接的生成系统。本发明实施例为用户或者公司提供了高度可定制化的文字或者文章链接生成音频流或者音频链接,为用户定制最“可甜可盐”的声音,可以制作各种定制化的听文章朗读软件,为智能家居产品提供可定制化的人工智能语音。同时还生成功能中为用户提供试听功能,提高用户的体验。(The embodiment of the invention provides a method for generating audio or audio links. The method comprises the following steps: cutting the text to generate a plurality of text segments; gradually requesting TTS service from the plurality of text segments to the voice conversation platform according to a stacking mode; and generating a plurality of audio links or a plurality of audio files according to the popping mode. The embodiment of the invention also provides a system for generating the audio or the audio link. The embodiment of the invention provides highly customizable words or article links for users or companies to generate audio streams or audio links, customizes the most sweet and salty sound for the users, can manufacture various customized article listening and reading software, and provides customizable artificial intelligent voice for intelligent household products. Meanwhile, a trial listening function is provided for the user in the generating function, and the user experience is improved.)

1. A method of generating audio or audio links, comprising:

cutting the text to generate a plurality of text segments;

gradually requesting TTS service from the text segments to a voice dialogue platform according to a stacking mode;

and generating a plurality of audio links or a plurality of audio files according to the popping mode.

2. The method of claim 1, wherein the cutting the text to generate a plurality of text segments comprises:

and responding to the personalized requirements of the user, and cutting the text.

3. The method of claim 2, wherein the cutting text further comprises:

and cutting the text according to the punctuation marks.

4. The method of claim 1, wherein the text is from a crawler link crawling user uploaded links and user entered text.

5. A system for generating audio or audio links, comprising:

the text cutting program module is used for cutting the text to generate a plurality of text sections;

the service request program module is used for gradually requesting TTS service from the plurality of text segments to the voice conversation platform according to a stacking mode;

and the pop program module is used for generating a plurality of audio links or a plurality of audio files according to the pop mode.

6. The system of claim 5, wherein the text cutter module is to:

and responding to the personalized requirements of the user, and cutting the text.

7. The system of claim 6, wherein the text cutter module is further to:

and cutting the text according to the punctuation marks.

8. The system of claim 5, wherein the text is from a crawler link crawling user uploaded links and user entered text.

9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any of claims 1-4.

10. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.

Technical Field

The invention relates to the field of intelligent voice, in particular to a method and a system for generating audio or audio links.

Background

To achieve text to audio conversion, text to speech tools are typically used. The user inputs or copies and pastes some characters on the software, then the user selects some different tone colors from the software, and clicks the synthesized audio to generate the audio required by the user.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:

for the purpose of universality, the text-to-speech tool mainly supports manual content input and content copying to generate an audio stream, generally uses a text marking mode, serially synthesizes audio, cannot realize distributed generation of audio link, and can realize synthesis while trial listening.

Disclosure of Invention

The method and the device at least solve the problems that in the prior art, the serial synthesis processing efficiency is low, synthesis and audition cannot be achieved, and personalized audio generation cannot be provided for a user.

In a first aspect, an embodiment of the present invention provides a method for generating an audio or an audio link, where the method includes:

cutting the text to generate a plurality of text segments;

gradually requesting TTS service from the text segments to a voice dialogue platform according to a stacking mode;

and generating a plurality of audio links or a plurality of audio files according to the popping mode.

In a second aspect, an embodiment of the present invention provides an audio or audio link generation system, including:

the text cutting program module is used for cutting the text to generate a plurality of text sections;

the service request program module is used for gradually requesting TTS service from the plurality of text segments to the voice conversation platform according to a stacking mode;

and the pop program module is used for generating a plurality of audio links or a plurality of audio files according to the pop mode.

In a third aspect, an electronic device is provided, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the method for generating audio or audio links of any of the embodiments of the present invention.

In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, implement the steps of the audio or audio link generation method according to any embodiment of the present invention.

The embodiment of the invention has the beneficial effects that: the software realized by the method provides highly customizable words or article links for users or companies to generate audio streams or audio links, customizes the most sweet and salty sounds for the users, can make various customized article listening and reading software (listen and read articles or news), and provides customizable artificial intelligent voice for intelligent household products. Meanwhile, a trial listening function is provided for the user in the generating function, and the user experience is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a flowchart of a method for generating audio or audio links according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an embodiment of a method for generating an audio link according to the present invention;

fig. 3 is a schematic structural diagram of an audio or audio link generation system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a method for generating an audio or audio link according to an embodiment of the present invention, including the following steps:

s11: cutting the text to generate a plurality of text segments;

s12: gradually requesting TTS service from the text segments to a voice dialogue platform according to a stacking mode;

s13: and generating a plurality of audio links or a plurality of audio files according to the popping mode.

In the embodiment, a large amount of related research on software and technology is carried out in the early stage, and a tool capable of realizing link or converting text input by a user into audio or audio link is designed from the use angle of the user; various software can be made based on the tools we have developed, such as making an article reading in a public article, or some children reading products.

In step S11, the user inputs the text to be converted into audio into a tool carrying the method, and after receiving the text, cuts the text into a plurality of text segments. The method considers the environment of articles in the public number or reading by children, so the text amount which can be processed by the method is larger. For example, articles in the public number or the apprentice fairy tale have thousands of characters, and a large number of text segments are also generated after cutting.

For step S12, after a large number of text segments are cut out, each text segment is sent to the background of the tool through the Http interface concurrently. The background receives the Text and will interface To the TTS (Text To Speech ) Text-To-Speech technology of the Speech dialog platform (e.g., the cibys Speech dialog platform). And gradually requesting TTS service from the cibye speech dialogue platform by the text segments according to a stack pushing mode.

For step S13, after the TTS service finishes processing, a plurality of audio links or audio files are generated step by step in a pop manner. Because the function that the user needs to listen on trial is considered, the audio link is provided while the audio is provided, and the user can listen on trial directly or download the audio segment for listening on trial by clicking the audio link. In the TTS service process, continuous push and pull processing is realized, audio is converted, and a listening test function is provided for a user at the same time, the listening test function does not need to wait for the completion of the synthesis of all texts, and the user does not need to wait.

During audition, the distribution mode is mainly adopted, and matching is carried out according to (cutting) each language and the audio frequency segment. And finally, after all the text segments are converted into audio, returning the whole audio stream or audio link required by the user through the Http interface. The overall flow chart is shown in fig. 2.

According to the embodiment, high-efficiency conversion is realized for converting voice into text by using pushing and popping, a trial listening function is provided for a user in the TTS service process, the user does not need to wait for the completion of all processing, and the use experience of the user is improved.

As an implementation manner, in this embodiment, the cutting the text to generate the plurality of text segments includes:

and responding to the personalized requirements of the user, and cutting the text.

In the embodiment, the user is considered to have unique personalized requirements when converting the audio. For example, a tool that carries the method may first present an article to the user. The user can re-process the text of the article. For example, correcting polyphones in an article, adding further pauses, setting different speech rates, custom timbre, and volume levels. These functions are provided for the user in the tool. After the user personalizes the process, the tool clicks on the generated audio and starts the process of cutting the text.

The cutting the text further comprises: and cutting the text according to the punctuation marks. In the present embodiment, the punctuation marks are standard in consideration of reasonable division, and for example, one period may be set for division or two periods may be set for division. This can be adjusted to the actual situation.

According to the embodiment, the personalized function is provided for the user, and the generated audio is more suitable for the requirements of the user. The generated audio may be suitable for use with article reading products or children reading products.

In one embodiment, the text is from a crawler link crawling user uploaded links and user entered text.

In the present embodiment, the method provides a function of linking search articles in consideration of the fact that text is too troublesome if all the text is input by the user. The user inputs the article link in the tool, the background receives the article link, and the crawler crawls the content in the article link. A series of fault-tolerant processing is also performed after crawling. And after processing, displaying the text to the user through a tool.

According to the embodiment, highly customizable text or article links are provided for the user to generate the audio stream or audio links, so that the use of the user is facilitated, and the use efficiency is improved.

Fig. 3 is a schematic structural diagram of an audio or audio link generation system according to an embodiment of the present invention, which can execute the audio or audio link generation method according to any of the above embodiments and is configured in a terminal.

The audio or audio link generation system 10 provided by the present embodiment includes: a text cutter module 11, a service request program module 12 and a pop program module 13.

The text cutting program module 11 is configured to cut a text to generate a plurality of text segments; the service request program module 12 is configured to request the TTS service from the multiple text segments to the voice dialog platform step by step in a stacking manner; the pop program module 13 is configured to generate a plurality of audio links or a plurality of audio files according to a pop manner.

Further, the text cutting program module is configured to:

and responding to the personalized requirements of the user, and cutting the text.

Further, the text cutter module is further configured to:

and cutting the text according to the punctuation marks.

Further, the text is from a crawler link crawling user uploaded links and user entered text.

The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the audio or the audio link generation method in any method embodiment;

as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:

cutting the text to generate a plurality of text segments;

gradually requesting TTS service from the text segments to a voice dialogue platform according to a stacking mode;

and generating a plurality of audio links or a plurality of audio files according to the popping mode.

As a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform the method of generating audio or audio links in any of the method embodiments described above.

The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

An embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the method for generating audio or audio links of any of the embodiments of the present invention.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as tablet computers.

(3) Portable entertainment devices such devices may display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.

(4) Other electronic devices with data processing capabilities.

As used herein, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

10页详细技术资料下载

Method and system for generating audio or audio link

相关技术

网友询问留言