Control method and device of intelligent device, storage medium and electronic device

文档序号：193365 发布日期：2021-11-02 浏览：56次中文

阅读说明：本技术 智能设备的控制方法和装置、存储介质及电子设备 (Control method and device of intelligent device, storage medium and electronic device ) 是由郭凯于 2021-06-24 设计创作，主要内容包括：本发明公开了一种智能设备的控制方法和装置、存储介质及电子设备。其中,该方法包括：在智能设备处于已被唤醒的情况下,获取采集到的目标对象的语音数据；对语音数据进行声纹识别,得到目标对象的声纹特征；在声纹特征指示目标对象的身份角色类型为目标角色类型的情况下,控制智能设备将运行模式切换为与目标角色类型相匹配的模式。本发明解决了智能设备运行模式的切换方式单一的技术问题。(The invention discloses a control method and device of intelligent equipment, a storage medium and electronic equipment. Wherein, the method comprises the following steps: acquiring collected voice data of a target object under the condition that the intelligent equipment is awakened; carrying out voiceprint recognition on voice data to obtain voiceprint characteristics of a target object; and under the condition that the voiceprint characteristics indicate that the identity role type of the target object is the target role type, controlling the intelligent equipment to switch the operation mode into a mode matched with the target role type. The invention solves the technical problem of single switching mode of the operation mode of the intelligent equipment.)

1. A control method of an intelligent device is characterized by comprising the following steps:

acquiring collected voice data of a target object under the condition that the intelligent equipment is awakened;

performing voiceprint recognition on the voice data to obtain voiceprint characteristics of the target object;

and under the condition that the voiceprint characteristics indicate that the identity role type of the target object is the target role type, controlling the intelligent equipment to switch the operation mode to a mode matched with the target role type.

2. The method of claim 1, wherein the performing voiceprint recognition on the voice data to obtain the voiceprint feature of the target object comprises:

preprocessing the voice data to obtain voice frequency domain data;

and inputting the voice frequency domain data into a voice recognition model to obtain the voiceprint characteristics of the target object, wherein the voice recognition model is a voiceprint classification model obtained by training a plurality of sample labeled frequency domain data.

3. The method of claim 2, wherein before the pre-processing the voice data to obtain voice frequency domain data, further comprising:

obtaining a plurality of sample annotation data, wherein the plurality of sample annotation data comprises: the voice recognition method comprises the steps that first voice data marked with a child tag and second voice data marked with a human tag are obtained;

performing time-frequency transformation on the plurality of sample labeling data to obtain a plurality of sample labeling frequency domain data, wherein the plurality of sample labeling frequency domain data comprises: child frequency domain data corresponding to the first voice data, and adult frequency domain data corresponding to the second voice data;

sequentially taking each sample labeled frequency domain data as current sample labeled frequency domain data, and executing the following operations until a convergence condition is reached;

marking frequency domain data of the current sample and inputting the marked frequency domain data into an initialized voice recognition model to obtain a voiceprint recognition result;

under the condition that the voiceprint recognition result is inconsistent with the labeling label of the current sample labeling frequency domain data, acquiring next sample labeling frequency domain data as the current sample labeling frequency domain data, and adjusting the model parameters in the initialized voice recognition model according to the voiceprint recognition result;

under the condition that the voiceprint identification result is consistent with the labeling label of the current sample labeling frequency domain data, updating a successful identification counting result;

under the condition that the successful identification counting result does not reach a first threshold value, acquiring next sample labeling frequency domain data as the current sample labeling frequency domain data;

determining that the convergence condition is reached if the successful recognition count result reaches the first threshold.

4. The method of claim 2, wherein the pre-processing the voice data to obtain voice frequency domain data comprises:

carrying out voice filtering processing on the voice data to obtain filtered voice data;

carrying out noise reduction processing on the filtered voice data to obtain noise-reduced voice data;

and performing time-frequency transformation on the voice data subjected to noise reduction to obtain the voice frequency domain data.

5. The method according to claim 1, further comprising, after the voiceprint recognition of the voice data to obtain the voiceprint feature of the target object:

and under the condition that the voiceprint features indicate that the target object is a child, determining that the identity role type of the target object is the target role type.

6. The method according to claim 1, further comprising, after the voiceprint recognition of the voice data to obtain the voiceprint feature of the target object:

determining that the identity role type of the target object is not the target role type if the voiceprint features indicate that the target object is an adult;

and controlling the intelligent equipment to switch the operation mode into a mode matched with the adult type.

7. A control device of an intelligent device, comprising:

the acquisition module is used for acquiring the acquired voice data of the target object under the condition that the intelligent equipment is awakened;

the recognition module is used for carrying out voiceprint recognition on the voice data to obtain the voiceprint characteristics of the target object;

and the switching module is used for controlling the intelligent equipment to switch the operation mode into a mode matched with the target role type under the condition that the voiceprint characteristics indicate that the identity role type of the target object is the target role type.

8. The apparatus of claim 7, wherein the identification module further comprises:

the processing unit is used for preprocessing the voice data to obtain voice frequency domain data;

and the recognition unit is used for inputting the voice frequency domain data into a voice recognition model to obtain the voiceprint characteristics of the target object, wherein the voice recognition model is a voiceprint classification model obtained by training a plurality of sample labeled frequency domain data.

9. The apparatus of claim 8, wherein the identification unit further comprises:

a first training unit, configured to obtain a plurality of sample labeling data, where the plurality of sample labeling data include: the voice recognition method comprises the steps that first voice data marked with a child tag and second voice data marked with a human tag are obtained;

sequentially taking each sample labeled frequency domain data as current sample labeled frequency domain data, and executing the following operations until a convergence condition is reached;

marking frequency domain data of the current sample and inputting the marked frequency domain data into an initialized voice recognition model to obtain a voiceprint recognition result;

determining that the convergence condition is reached if the successful recognition count result reaches the first threshold.

10. The apparatus of claim 8, wherein the processing unit further comprises:

the first processing subunit is used for carrying out voice filtering processing on the voice data to obtain filtered voice data;

the second processing subunit is configured to perform noise reduction processing on the filtered voice data to obtain noise-reduced voice data;

and the third processing subunit is configured to perform time-frequency transformation on the noise-reduced voice data to obtain the voice frequency domain data.

11. The apparatus of claim 7, wherein the identification module further comprises:

a first determining unit, configured to determine that the identity role type of the target object is the target role type when the voiceprint feature indicates that the target object is a child.

12. The apparatus of claim 7, wherein the identification module further comprises:

a second determining unit, configured to determine that the identity role type of the target object is not the target role type if the voiceprint feature indicates that the target object is an adult;

and the first switching unit is used for controlling the intelligent equipment to switch the operation mode into a mode matched with the adult type.

13. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 6 when executed.

14. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 6.

Technical Field

The invention relates to the technical field of intelligent equipment control, in particular to a control method and device of intelligent equipment, a storage medium and electronic equipment.

Background

Many smart devices on the market today have different operating modes set for different target user groups. Such as smart televisions, are configured with a child mode and an adult mode for a child user group and an adult user group. Under the two different modes, the smart television can provide different operation modes and can push different contents aiming at different user groups; for another example, the smart car can be set to a male mode and a female mode, and in the two different modes, the smart car can set different in-car environments, for example, different styles of music are played for different user groups, and the display style of the in-car display screen is adjusted for different user groups.

Therefore, in a scenario where different types of users use the same smart device, the smart device needs to provide a function of switching the operation mode. At present, the mode of switching the operation mode of the intelligent device is single, for example, the mode can only be set on an operation interface through a touch operation mode, so as to switch the operation mode of the intelligent device. However, in some scenarios, it is not suitable to switch the operation mode of the smart device in a touch operation manner. For example, when driving an intelligent automobile, if the operation mode is switched by touch operation, potential safety hazards exist. Therefore, the technical problem that the mode for switching the operation mode of the intelligent device is single in the market at present needs to be solved urgently.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a control method and device of intelligent equipment, a storage medium and electronic equipment, and aims to at least solve the technical problem that the switching mode of the operation mode of the intelligent equipment is single.

According to an embodiment of the present invention, there is provided a control method of an intelligent device, including: acquiring collected voice data of a target object under the condition that the intelligent equipment is awakened; carrying out voiceprint recognition on voice data to obtain voiceprint characteristics of a target object; and under the condition that the voiceprint characteristics indicate that the identity role type of the target object is the target role type, controlling the intelligent equipment to switch the operation mode into a mode matched with the target role type.

According to another aspect of the embodiments of the present invention, there is also provided a control apparatus for an intelligent device, including: the acquisition module is used for acquiring the acquired voice data of the target object under the condition that the intelligent equipment is awakened; the recognition module is used for carrying out voiceprint recognition on the voice data to obtain the voiceprint characteristics of the target object; and the switching module is used for controlling the intelligent equipment to switch the operation mode into a mode matched with the target role type under the condition that the voiceprint characteristics indicate that the identity role type of the target object is the target role type.

According to still another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium in which a computer program is stored, wherein the computer program is configured to execute the control method of the intelligent device when running.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the control method of the intelligent device through the computer program.

In the embodiment of the invention, the voice data of the target object is identified, and then the operation mode of the intelligent device is switched to the operation mode matched with the target role type according to the identity role type of the target object indicated by the identification result, so that the aim of identifying the type of the target object through the voiceprint characteristics is fulfilled, the technical effect of enriching the switching modes of the operation modes of the intelligent device is realized, and the technical problem of single switching mode of the operation modes of the intelligent device is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of an alternative control method for a smart device according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an alternative control method for a smart device according to an embodiment of the invention;

FIG. 3 is a schematic diagram of an alternative method of training a speech recognition model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an alternative method of pre-processing speech data in accordance with embodiments of the invention;

FIG. 5 is a schematic diagram of yet another alternative control method for a smart device, according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an alternative control apparatus of an intelligent device according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of an alternative control apparatus for a smart device according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a control apparatus of another alternative intelligent device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, according to an aspect of an embodiment of the present invention, there is provided a control method of an intelligent device, including:

step S102, acquiring collected voice data of a target object under the condition that the intelligent equipment is awakened;

step S104, carrying out voiceprint recognition on the voice data to obtain the voiceprint characteristics of the target object;

and step S106, under the condition that the voiceprint characteristics indicate that the identity role type of the target object is the target role type, controlling the intelligent equipment to switch the running mode to a mode matched with the target role type.

Optionally, the manner in which the smart device is awakened may be, but is not limited to, a manner of awakening by using a voice command, a manner of awakening by using a remote control, and a manner of awakening by touch, which is not limited herein. The condition that the smart device is already awake may refer to, but not limited to, a situation that the smart device is already in a certain operation mode and can receive a voice command, and may also refer to a situation that the smart device is already started and can receive a voice command but is not in a certain operation mode, which is not limited herein. In the case where the smart device is already in a certain operation mode, the operation mode is a default initial state operation mode, or may be an operation mode randomly confirmed from modes stored in the smart device, which is not limited herein.

Optionally, the collected voice data of the target object may be, but is not limited to, voice data of the target object pre-stored in the smart device, that is, voice data recorded in the last operation stored in the smart device is obtained, or voice data of the current target object obtained after waking up is obtained. After the target object is woken up, and then the voice data of the current target object is obtained, the voice data corresponding to the woken-up voice command of the target object collected when the target object is woken up may be obtained, or the voice data recorded in the obtained voice command may be obtained by instructing the target object to input the voice command after the target object is woken up, which is not limited herein.

It should be noted that the voiceprint features can refer to, but are not limited to, acoustic features related to the anatomical structure of the human voice production mechanism, such as frequency spectrum, cepstrum, formants, pitch, emission coefficients, and the like.

It is understood that the identity role type of the target object can be, but is not limited to, according to age classification: the old, the middle aged and the children can also be divided according to the sex: the male and the female can also be divided into different identity and role types according to the age and the sex: the specific division method is not limited herein, and the method is not limited herein. The smart device may have built-in operation modes corresponding to different target character types, such as an old age mode, a middle age mode, a child mode, a male mode, a female mode, a boy mode, a girl mode, and the like, corresponding to the identity character types of the target object divided according to different standards, which is not limited herein. It can be understood that the identity role type of the target object can be indicated through the voiceprint just because the voiceprint features of the role types described above are significantly different.

Optionally, the difference between the operation modes of the smart device matched with different target roles may be the difference between the display content and the operation mode, or may be the difference between the display content and the operation mode, which is not limited herein. For example, on a smart television, an adult mode and a child mode matched with the types of the adult and the child can be provided, wherein the display contents provided for the target object in the adult mode and the display contents provided for the target object in the child mode are different, the movie resources are not actively filtered in the adult mode, the movie resources which are not suitable for the child to watch are actively filtered in the child mode, and the movie resources in the animation type are mainly provided. Different operation modes can be set according to different operation habits of adults and children, for example, in an adult mode, an operation mode mainly based on touch operation is provided; in the child mode, an operation mode mainly based on voice operation is provided, which is not limited herein.

Optionally, the operation mode of the smart device may be an operation mode set pre-built in the smart device, or an operation mode set stored in a cloud server, which is not limited herein; the operation mode of the intelligent device may be an operation mode obtained by human setting in advance, or an operation mode obtained by training according to the use habit of the user, which is not limited herein.

Optionally, in this embodiment, the device may be a device having the capability of transceiving data and control instructions, and may include but is not limited to at least one of the following: mobile phones (such as Android Mobile phones, iOS Mobile phones, etc.), notebook computers, tablet computers, palm computers, MID (Mobile Internet Devices), PAD, desktop computers, smart televisions, smart speakers, smart air conditioners, etc. The target client may be a video client, an instant messaging client, a browser client, an educational client, etc. Such networks may include, but are not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communication. The server may be a single server, a server cluster composed of a plurality of servers, or a cloud server. The above is merely an example, and this is not limited in this embodiment.

In an optional embodiment of the present invention, in the step S104, performing voiceprint recognition on the voice data to obtain a voiceprint feature of the target object, the method may further include:

step S202, preprocessing voice data to obtain voice frequency domain data;

step S204, inputting the voice frequency domain data into a voice recognition model to obtain the voiceprint characteristics of the target object, wherein the voice recognition model is a voiceprint classification model obtained by training a plurality of sample labeled frequency domain data.

It can be understood that the above-mentioned preprocessing of the voice data may be filtering the voice data, or denoising the voice data, or performing time-frequency transformation on the voice data, or a combination of the above-mentioned processing operations, which is not limited herein, and by preprocessing the voice data, noise in the data for voice recognition is reduced, thereby achieving a technical effect of improving recognition accuracy.

In this embodiment, a speech audio domain obtained by preprocessing the speech data is input into the speech recognition model, so that a technical effect of improving the recognition accuracy can be achieved.

Optionally, the training method of the speech recognition model in step S202 may be the following steps:

step S302, obtaining a plurality of sample labeling data, where the plurality of sample labeling data includes: the voice recognition method comprises the steps that first voice data marked with a child tag and second voice data marked with a human tag are obtained;

step S304, performing time-frequency transformation on the plurality of sample labeling data to obtain a plurality of sample labeling frequency domain data, wherein the plurality of sample labeling frequency domain data comprises: child frequency domain data corresponding to the first voice data, and adult frequency domain data corresponding to the second voice data;

sequentially taking each sample labeled frequency domain data as current sample labeled frequency domain data, and executing the following operations until a convergence condition is reached;

step S306, marking frequency domain data of the current sample and inputting the marked frequency domain data into the initialized voice recognition model to obtain a voiceprint recognition result;

step S308, judging whether the voiceprint recognition result is consistent with the label of the current sample labeled frequency domain data, acquiring next sample labeled frequency domain data as the current sample labeled frequency domain data under the condition that the voiceprint recognition result is inconsistent with the label of the current sample labeled frequency domain data, adjusting model parameters in the initialized voice recognition model according to the voiceprint recognition result, and then executing step S306; executing the step S310 under the condition that the voiceprint identification result is consistent with the labeling label of the current sample labeling frequency domain data, and updating the successful identification counting result;

step S312, judging whether the successful identification counting result reaches a first threshold value, under the condition that the successful identification counting result does not reach the first threshold value, acquiring next sample labeled frequency domain data as current sample labeled frequency domain data, and then executing step S306; in the case where the successful recognition count result reaches the first threshold value, step S314 is performed to determine that the convergence condition is reached.

Optionally, the identification model may be configured to perform binary identification on the sample labeling data, so as to improve the identification efficiency.

Alternatively, the sample annotation data may be, but is not limited to being, from a public database. Because the sample marking data volume in the public database is large, the recognition model for recognizing the voice data is obtained by utilizing the public voice database for training, the training efficiency of the recognition model can be improved, and the technical effect of improving the recognition accuracy of the recognition model is realized.

Alternatively, the counting result may be the number of successfully recognized samples, or may be the current successful recognition rate, which is not limited herein. Optionally, the first threshold may be a fixed value set manually according to needs, may also be a fixed value obtained by training according to needs, and may also be a variable value set according to needs, which is not limited herein.

Through the embodiment, a recognition model for voiceprint recognition can be obtained through training, the model is obtained through training based on the existing voice sample labeling data, and the recognition accuracy is high. Meanwhile, voice recognition is carried out based on the recognition model obtained through training, and voice data of the target object does not need to be obtained in advance, so that desensitization of private information of the target object in the voice data is realized, and the privacy problem caused by leakage of the voice data of the target object is avoided.

In an alternative embodiment of the present invention, as shown in fig. 4, the step S202 may further include:

step S402, voice filtering processing is carried out on the voice data to obtain filtered voice data;

step S404, noise reduction processing is carried out on the filtered voice data to obtain noise-reduced voice data;

step S406, performing time-frequency transformation on the voice data subjected to noise reduction to obtain voice frequency domain data.

Optionally, the processing method for performing the human voice filtering processing may be to perform filtering processing according to the loudness of the speech data, for example, to perform filtering processing on speech data that is higher than a certain loudness threshold or lower than a certain loudness threshold. Or, filtering processing may be performed according to the frequency of the voice data, for example, filtering processing is performed on the voice data that is higher than a certain frequency threshold or lower than a certain frequency threshold. It is also possible to perform filtering processing according to the loudness and frequency of the speech data. The method of filtering the voice data is not limited herein.

Alternatively, the method of performing noise reduction processing on the filtered speech data may be to process the speech data by using audio filtering, and the audio filtering may be selected according to actual needs, which is not limited herein.

In the present embodiment, by preprocessing the voice data, the noise portion in the data used for voice recognition is reduced, thereby achieving the technical effect of improving the recognition accuracy.

In an optional embodiment of the present invention, in step S106, when the voiceprint feature indicates that the identity role type of the target object is the target role type, controlling the intelligent device to switch the operation mode to a mode matched with the target role type, may further include:

step S1, when the voiceprint feature indicates that the target object is a child, determining that the identity role type of the target object is the target role type.

Step S2, determining the identity role type of the target object is not the target role type under the condition that the voiceprint characteristics indicate that the target object is an adult; and controlling the intelligent device to switch the operation mode to a mode matched with the adult type.

In this embodiment, based on different voice recognition results, the intelligent device is controlled to switch the operation mode to a mode matched with the voice recognition results, so that the technical effect of enriching the switching mode of the operation mode of the intelligent device is achieved, and the technical problem that the switching mode of the operation mode of the intelligent device is single is solved.

An embodiment of the present invention will be described below with reference to fig. 5.

Step S502, the intelligent device is awakened by a user and awakening voice data is collected;

step S504, voice data are preprocessed to obtain voice frequency domain data;

step S506, using the speech recognition model obtained in the training stage to classify and recognize the frequency domain data;

step S508, classifying and recognizing the frequency domain data by using the speech recognition model obtained in the training phase, and if the recognition result indicates that the role type of the target object is a child, performing step S510, where the intelligent device switches the operation mode to a child mode, and if the recognition result indicates that the role type of the target object is an adult, performing step S512, where the intelligent device switches the operation mode to an adult mode.

It is understood that, in this embodiment, the step S502 may be to perform type identification of the target role based on the wake-up voice data of the user. Because the voice data of the user does not need to be stored in advance, the type identification of the target role is carried out based on the current awakening voice data of the user, the leakage of the privacy data of the user is avoided, and the technical effect of enhancing the privacy protection of the user is realized.

It can be understood that, in the step S504, the preprocessing the voice data includes performing human voice filtering processing on the voice data to obtain filtered voice data; then, noise reduction processing is carried out on the filtered voice data to obtain noise-reduced voice data; and finally, performing time-frequency transformation on the voice data subjected to noise reduction to obtain voice frequency domain data. By preprocessing the voice data, the noise part in the data for voice recognition is reduced, thereby achieving the technical effect of improving the recognition accuracy.

It is understood that, in step S506, the speech recognition model obtained in the training stage may be a speech classification model obtained by training with machine learning using an open speech database, and may be used for recognizing adults and children. By training the classification model by adopting the public voice database, the recognition accuracy of the recognition model is improved. Meanwhile, the training obtains two classification models, and the recognition speed of the recognition model is improved.

In the above embodiment, the voice data of the target object is recognized, and then the operation mode of the intelligent device is switched to the operation mode matched with the target role type according to the identity role type of the target object indicated by the recognition result, so that the purpose of recognizing the type of the target object through the voiceprint features is achieved, the technical effect of enriching the switching modes of the operation modes of the intelligent device is achieved, and the technical problem of single switching mode of the operation modes of the intelligent device is solved.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiment of the invention, the invention further provides a control device of the intelligent device. As shown in fig. 6, the apparatus includes:

the acquisition module 601 is configured to acquire acquired voice data of the target object when the smart device is already awake;

the recognition module 602 is configured to perform voiceprint recognition on the voice data to obtain a voiceprint feature of the target object;

the switching module 603 is configured to control the intelligent device to switch the operation mode to a mode matched with the target role type when the voiceprint feature indicates that the identity role type of the target object is the target role type.

Optionally, as shown in fig. 7, the identifying module 602 may further include:

a processing unit 702, configured to perform preprocessing on voice data to obtain voice frequency domain data;

and the recognition unit 704 is configured to input the voice frequency domain data into a voice recognition model to obtain a voiceprint feature of the target object, where the voice recognition model is a voiceprint classification model obtained by training a plurality of sample labeled frequency domain data.

Optionally, the recognition unit 704 may further include a training unit, which may be configured to:

performing time-frequency transformation on the plurality of sample labeling data to obtain a plurality of sample labeling frequency domain data, wherein the plurality of sample labeling frequency domain data comprise: child frequency domain data corresponding to the first voice data, and adult frequency domain data corresponding to the second voice data;

sequentially taking each sample labeled frequency domain data as current sample labeled frequency domain data, and executing the following operations until a convergence condition is reached;

marking frequency domain data of a current sample and inputting the marked frequency domain data into an initialized voice recognition model to obtain a voiceprint recognition result;

under the condition that the voiceprint recognition result is inconsistent with the labeling label of the current sample labeling frequency domain data, acquiring next sample labeling frequency domain data as the current sample labeling frequency domain data, and adjusting model parameters in the initialized voice recognition model according to the voiceprint recognition result;

under the condition that the voiceprint identification result is consistent with the labeling label of the current sample labeling frequency domain data, updating the successful identification counting result;

under the condition that the successful identification counting result does not reach the first threshold value, acquiring next sample labeled frequency domain data as current sample labeled frequency domain data;

in the case where the successful recognition count result reaches the first threshold value, it is determined that the convergence condition is reached.

Optionally, as shown in fig. 8, the processing unit 702 may further include:

a first processing subunit 802, configured to perform voice filtering processing on the voice data to obtain filtered voice data;

the second processing subunit 804 is configured to perform noise reduction processing on the filtered voice data to obtain noise-reduced voice data;

and a third processing subunit 806, configured to perform time-frequency transformation on the noise-reduced voice data to obtain voice frequency domain data.

Optionally, the identifying module 602 further includes:

the first determining unit is used for determining that the identity role type of the target object is the target role type under the condition that the voiceprint characteristics indicate that the target object is a child.

Optionally, the identifying module 602 further includes:

the second determining unit is used for determining that the identity role type of the target object is not the target role type under the condition that the voiceprint characteristics indicate that the target object is an adult;

and the first switching unit is used for controlling the intelligent equipment to switch the running mode into a mode matched with the adult type.

It can be understood that the apparatus according to the embodiment of the present invention has the beneficial effects corresponding to the control method of the intelligent device described above, and details are not described here.

According to a further aspect of an embodiment of the present invention, there is also provided a storage medium comprising a stored program, wherein the program is arranged to perform the steps in any of the above method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, acquiring the collected voice data of the target object under the condition that the intelligent device is awakened;

s2, carrying out voiceprint recognition on the voice data to obtain the voiceprint characteristics of the target object;

and S3, controlling the intelligent device to switch the operation mode to a mode matched with the target role type under the condition that the voiceprint characteristic indicates that the identity role type of the target object is the target role type.

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

According to a further aspect of an embodiment of the present invention, there is also provided an electronic apparatus for implementing the control method of the intelligent device, the electronic apparatus including a memory in which a computer program is stored and a processor configured to execute the steps in any one of the method embodiments described above by the computer program.

Optionally, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, acquiring the collected voice data of the target object under the condition that the intelligent device is awakened;

s2, carrying out voiceprint recognition on the voice data to obtain the voiceprint characteristics of the target object;

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

18页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种实现家电协同工作的智能手机

Control method and device of intelligent device, storage medium and electronic device

相关技术

网友询问留言