Configurable sound changing device

文档序号:1253938 发布日期:2020-08-21 浏览:25次 中文

阅读说明:本技术 一种可配置的变声装置 (Configurable sound changing device ) 是由 秦垠峰 闫冰 于 2020-04-29 设计创作,主要内容包括:本发明提供一种可配置的变声装置,本发明涉及一种用于变声的装置技术领域,此外,本发明还涉及用于上述变声装置的变声模型的生成方法,该可配置的变声装置包括手机、变声设备和云端服务器,手机和变声设备通过远程控制交互单元信号连接,手机和云端服务器通过交互系统信号连接,变声设备内设置有变声模块,云端服务器内设置有模型生成模块和用户信息模块,可以实现变声效果能够更加自然,并且更加易于定制音色。(The invention provides a configurable sound changing device, relates to the technical field of sound changing devices, and further relates to a sound changing model generation method for the sound changing device.)

1. A configurable sound-altering device, comprising: the mobile phone is in signal connection with the voice changing equipment through a remote control interaction unit, the mobile phone is in signal connection with the cloud server through an interaction system, a voice changing module is arranged in the voice changing equipment, and a model generating module and a user information module are arranged in the cloud server.

2. The configurable sound varying apparatus according to claim 1, wherein: the remote control interaction module is positioned in the voice changing equipment and used for receiving a control instruction from a mobile phone end, executing starting, stopping the voice changing module, adjusting the configuration of the voice changing module and transmitting file data to work, and the remote control interaction module is positioned in the mobile phone and used for controlling the starting, stopping and adjusting the voice changing setting of the voice changing equipment and transmitting a characteristic embedding vector of a customized target speaker to the voice changing equipment.

3. The configurable sound varying apparatus according to claim 2, wherein: the interactive system comprises a cloud interactive module and a mobile phone interactive module which are respectively arranged in a mobile phone and a cloud server, wherein the cloud interactive module is used for managing user information, uploading a recording and downloading generated target speaker characteristic embedded vectors, and the mobile phone interactive module is used for communicating with a mobile phone terminal, registering, checking, modifying the user information, receiving the recording file and transmitting the target speaker characteristic embedded vectors.

4. The configurable sound varying apparatus according to claim 3, wherein: the model generation module and the user information module are both electrically connected with the mobile phone interaction module, the model generation module is used for generating target speaker characteristic embedding vectors of a group of specified target speaker recording files through the specified group of target speaker recording files, and the user information module is used for storing personal information of a user, uploaded recording and a customized tone model list.

5. The configurable sound varying apparatus according to claim 4, wherein: the sound changing equipment also comprises an A/D and D/A converter, a digital signal processing chip, a central processing unit module, a memory and a storage, wherein the A/D and D/A converter, the digital signal processing chip, the memory and the storage are in signal connection with the central processing unit module, the A/D and D/A converter is used for converting an analog signal input from the outside into a digital signal and converting the digital signal into the analog signal and then outputting the analog signal, and the central processing unit module is matched with the memory and the storage module to bear the integral equipment drive and the operation of a sound changing algorithm.

6. An operating method of a sound changing device is characterized by comprising the following steps:

s1, uploading a voice recording file of the style of a target speaker and a recording file of an original speaker by a user through mobile phone end application software, wherein the recording content of the original speaker is the same as the recording content of the target speaker, and the duration is twenty-five minutes to thirty-five minutes;

s2, the cloud server generates a voice change model converted to the target speaker, the mobile phone end application software controls the voice change model to be wirelessly connected to the device, and the voice change model of the target speaker is downloaded to voice change equipment;

s3, the mobile phone end application software selects and uses one of the target speaker models or controls the start of the sound changing function of the sound changing equipment through the button of the sound changing equipment;

s4, setting the input and output connection of the sound changing equipment;

s5, the sound changing device receives the voice spoken by the user and performs sound changing processing through the sound changing model selected by the sound changing device;

and S6, the sound changing device transmits the sound changed signal through the output connection.

7. A method for generating a sound variation model is characterized by comprising the following steps:

a: the original speaker voice is subjected to a sound variation model based on a deep neural network to obtain sound after sound variation;

b, extracting audio features of short-time energy, zero-crossing rate and Mel cepstrum coefficient from the voice after sound change and the input voice of the target speaker, and extracting once according to a 30ms time window and a 10ms sliding window;

respectively and sequentially inputting the audio features obtained by each time window of the two voices into the same pre-trained speaker feature encoder to respectively obtain an original speaker feature embedded vector and a target speaker feature embedded vector;

d, respectively inputting two characteristic embedding vectors of the original speaker characteristic embedding vector and the target speaker characteristic embedding vector into a pre-trained decoder to obtain the original speaker restored voice characteristic and the target speaker restored voice characteristic;

generating loss by comparing two different characteristics in the original speaker reduction voice characteristic and the target speaker reduction voice characteristic, and changing the parameters of the acoustic change model by reversely transmitting the loss to the acoustic change model;

iterating the steps a-e until the loss obtained in the step e is smaller than a preset value or the iteration times exceed the preset times;

and g, finally, taking out the trained voice changing model, wherein the trained voice changing model is a voice changing model for changing the tone of the original speaker to the tone of the target speaker.

Technical Field

The invention relates to the technical field of a device for changing sound, and in addition, the invention also relates to a method for generating a sound changing model of the sound changing device.

Background

In recent years, with the improvement of the service quality requirement of the call center service and the rise of the live broadcast service, the tone of the seat/anchor has certain requirements, the magnetic or sweet tone can greatly increase the good feeling of the client, and with the development of the artificial intelligence technology, particularly the application of the deep neural network in the field of voice processing, the application of the deep neural network has been developed sufficiently, so that the sound changing effect can be more natural, and the tone can be customized more easily.

The existing sound changing device is mainly completed through a digital signal processing chip (DSP), the mode can only change a plurality of fixed timbres, the configuration is not easy, the timbres cannot be customized, and the sound changing device can only adjust simple audio characteristics such as sampling rate, loudness and the like, and the sound changing effect is unnatural.

Disclosure of Invention

The invention aims to provide a configurable sound changing device, and aims to solve the problems that the sound changing device in the prior art can only change a plurality of fixed timbres, is not easy to configure, can not customize the timbres, has unnatural sound changing effect and the like.

In order to achieve the purpose, the invention provides the following technical scheme:

the utility model provides a configurable sound variation device includes cell-phone, sound variation equipment and high in the clouds server, cell-phone and sound variation equipment pass through the mutual unit signal connection of remote control, cell-phone and high in the clouds server pass through interactive system signal connection, be provided with the sound variation module in the sound variation equipment, be provided with model generation module and user information module in the high in the clouds server, can realize that the sound variation effect can be more natural to it is easier to customize the tone quality.

Preferably, the remote control interaction unit includes two remote control interaction modules respectively disposed in the voice-changing device and the mobile phone, the remote control interaction module located in the voice-changing device is configured to receive a control instruction from the mobile phone, execute a start operation, a stop operation, and an adjustment operation of the configuration of the voice-changing module and transmit file data, and the remote control interaction module located in the mobile phone is configured to control the start operation, the stop operation, the adjustment operation of the voice-changing setting of the voice-changing device, and transmit a customized target speaker feature embedded vector to the voice-changing device.

Preferably, the interaction system comprises a cloud interaction module and a mobile phone interaction module which are respectively arranged in the mobile phone and the cloud server, the cloud interaction module is used for managing user information, uploading a record and downloading a generated target speaker feature embedded vector, and the mobile phone interaction module is used for communicating with the mobile phone terminal, registering, checking, modifying the user information, receiving the record file and transmitting the target speaker feature embedded vector.

Preferably, the model generation module and the user information module are both electrically connected to the mobile phone interaction module, the model generation module is configured to generate target speaker feature embedded vectors of a group of specified target speaker sound recording files through the specified group of target speaker sound recording files, and the user information module is configured to store personal information of a user, uploaded sound recordings, and a customized tone model list.

Preferably, the sound-changing device further comprises an a/D and D/a converter, a digital signal processing chip, a central processing unit module, a memory and a storage, the a/D and D/a converter, the digital signal processing chip, the memory and the storage are in signal connection with the central processing unit module, the a/D and D/a converter is used for converting an analog signal input from the outside into a digital signal and converting the digital signal into the analog signal and then outputting the analog signal, and the central processing unit module is matched with the memory and the storage module to bear the integral device drive and the operation of the sound-changing algorithm.

An operating method of a sound changing device comprises the following steps:

s1, uploading a voice recording file of the style of a target speaker and a recording file of an original speaker by a user through mobile phone end application software, wherein the recording content of the original speaker is the same as the recording content of the target speaker, and the duration is twenty-five minutes to thirty-five minutes;

s2, the cloud server generates a voice change model converted to the target speaker, the mobile phone end application software controls the voice change model to be wirelessly connected to the device, and the voice change model of the target speaker is downloaded to voice change equipment;

s3, the mobile phone end application software selects and uses one of the target speaker models or controls the start of the sound changing function of the sound changing equipment through the button of the sound changing equipment;

s4, setting the input and output connection of the sound changing equipment;

s5, the sound changing device receives the voice spoken by the user and performs sound changing processing through the sound changing model selected by the sound changing device;

and S6, the sound changing device transmits the sound changed signal through the output connection.

The acoustic change model generation method comprises the following steps:

a, obtaining the voice of an original speaker after voice change through a voice change model based on a deep neural network;

b, extracting audio features of short-time energy, zero-crossing rate and Mel cepstrum coefficient from the voice after sound change and the input voice of the target speaker, and extracting once according to a 30ms time window and a 10ms sliding window;

respectively and sequentially inputting the audio features obtained by each time window of the two voices into the same pre-trained speaker feature encoder to respectively obtain an original speaker feature embedded vector and a target speaker feature embedded vector;

d, respectively inputting two characteristic embedding vectors of the original speaker characteristic embedding vector and the target speaker characteristic embedding vector into a pre-trained decoder to obtain the original speaker restored voice characteristic and the target speaker restored voice characteristic;

generating loss by comparing two different characteristics in the original speaker reduction voice characteristic and the target speaker reduction voice characteristic, and changing the parameters of the acoustic change model by reversely transmitting the loss to the acoustic change model;

iterating the steps a-e until the loss obtained in the step e is smaller than a preset value or the iteration times exceed the preset times;

and g, finally, taking out the trained voice changing model, wherein the trained voice changing model is a voice changing model for changing the tone of the original speaker to the tone of the target speaker.

Compared with the prior art, the invention has the beneficial effects that:

1. in the scheme, the voice changing device can customize the timbre of any target speaker, can be replaced and started at any time, has only fixed timbres compared with the traditional voice changing device, and has great improvement on the configurability of voice changing

2. In the scheme, the traditional sound changing technology realizes sound changing through linear transformation by simply changing audio characteristics such as sampling rate and loudness, the effect is unnatural, people can hear the sound changing easily, the technology can perfectly fit the sound to the tone of a target speaker through the nonlinear change of a neural network model, and a more natural sound changing effect can be obtained

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a functional block diagram of the present invention;

FIG. 2 is a functional block diagram of a sound varying apparatus according to the present invention;

FIG. 3 is a diagram of the operation of the present invention;

fig. 4 is a schematic structural diagram of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:非线性降噪系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!