Apparatus, system and method for directing voice input in a control device

文档序号：991508 发布日期：2020-10-20 浏览：8次中文

阅读说明：本技术 用于引导控制设备中的语音输入的装置、系统和方法 (Apparatus, system and method for directing voice input in a control device ) 是由阿沙姆·哈塔姆贝可伊于 2019-01-02 设计创作，主要内容包括：一种用于控制驻留在环境中的可控设备的系统和方法,包括适于接收语音输入的设备。该系统和方法为设备运行环境建立噪声阈值,在设备处接收语音输入,确定设备接收语音输入时环境的噪声水平,比较确定的噪声水平和建立的噪声阈值,并将一个或多个命令自动发送给可控设备,从而在确定的环境噪声水平大于建立的环境噪声阈值时使可控设备从第一音量水平转换到小于第一音量水平的第二音量水平。(A system and method for controlling a controllable device residing in an environment includes a device adapted to receive speech input. The system and method establishes a noise threshold for the device operating environment, receives a voice input at the device, determines a noise level of the environment in which the device receives the voice input, compares the determined noise level to the established noise threshold, and automatically sends one or more commands to the controllable device to cause the controllable device to switch from a first volume level to a second volume level less than the first volume level when the determined ambient noise level is greater than the established ambient noise threshold.)

1. A method for controlling a controllable apparatus residing in an environment, the controllable apparatus comprising a device adapted to receive speech input, the method comprising:

establishing a noise threshold value of the equipment operation environment;

receiving a voice input at a device;

determining a noise level of an environment when the device receives the speech input;

comparing the determined noise level to an established noise threshold; and

when the comparison indicates that the determined ambient noise level is greater than the established ambient noise threshold, causing one or more commands to be automatically issued to the controllable device to cause the controllable device to change from a first state having a first volume level to a second state having a second volume level, the second volume level being less than the first volume level.

2. The method of claim 1, wherein the device is adapted to support remote control functionality, and wherein the one or more commands are sent directly by the device to the controllable device.

3. The method of claim 2, wherein the one or more commands comprise a volume mute command.

4. The method of claim 2, wherein the one or more commands comprise one or more volume down commands.

5. The method of claim 2, wherein the one or more commands comprise a power off command.

6. The method of claim 1, comprising: within a predetermined time period after receiving the speech input, it is determined that the device has not received further speech input, and in response one or more commands are caused to be automatically sent to the controllable device, causing the controllable device to transition from the second state back to the first state.

7. The method of claim 6, wherein the device is adapted to support remote control functionality, and wherein the one or more commands for transitioning the controllable device from the first state to the second state and transitioning the controllable device from the second state back to the first state are both transmitted directly by the device to the controllable device.

8. The method of claim 1, comprising: within a predetermined time period after receiving the voice input, it is determined that the device is not expected to receive further voice input, and in response, one or more commands are caused to be automatically issued to the controllable device, causing the controllable device to transition from the second state back to the first state.

9. The method of claim 8, wherein the device is adapted to support remote control functionality, and wherein the one or more commands for transitioning the controllable device from the first state to the second state and transitioning the controllable device from the second state back to the first state are both transmitted directly by the device to the controllable device.

10. The method of claim 1, further comprising using a known operating state of the controllable device to determine whether to automatically issue one or more commands to the controllable device to change the controllable device from a first state having a first volume level to a second state having a second volume level, the second volume level being less than the first volume level.

11. The method of claim 1, further comprising: using the derived operational state of the controllable appliance to determine whether to automatically issue one or more commands to the controllable device to transition the controllable device from a first state having a first volume level to a second state having a second volume level, the second volume level being less than the first volume level.

12. The method of claim 1, further comprising: establishing a voice input threshold for the device; determining a speech level of the received speech input; comparing the determined speech level to the speech input threshold; and when the comparison indicates that the determined speech input level is greater than the established speech input threshold, causing one or more outputs generated in response to the speech input to have a first volume level that is greater than an output level normally used.

13. The method of claim 12, wherein the device includes a speaker for outputting one or more outputs generated in response to the voice input.

14. The method of claim 1, further comprising: establishing a voice input threshold for the device; determining a speech level of the received speech input; comparing the determined speech level to the speech input threshold; and when the comparison indicates that the determined speech input level is greater than the established speech input threshold, causing one or more outputs generated in response to the speech input to have a first volume level that is less than an output level normally used.

15. The method of claim 12, wherein the device includes a speaker for outputting one or more outputs generated in response to the voice input.

16. The method of claim 1, comprising: the time of day is associated with the established noise threshold, and the determined noise level is compared to the established noise threshold only if the received speech input is received during a certain time of day.

17. The method of claim 1, wherein the device comprises a remote control application, wherein the remote control application is provided with a set of command codes adapted to command functional operations of the controllable device, and wherein the device uses the set of command codes to automatically send one or more commands to the controllable device using a transmitter associated with the device to transition the controllable device from a first state having a first volume level to a second state having a second volume level, the second volume level being less than the first volume level.

18. The method of claim 1, wherein the device comprises a voice control application, wherein the voice control application is provided with at least one protocol for sending voice commands to at least one other device adapted to receive voice input, and wherein the device automatically sends one or more voice commands to at least one other device using the set protocol to transition a controllable device from a first state having a first volume level to a second state having a second volume level, the second volume level being less than the first volume level, using a transmitter associated with the device.

19. The method of claim 18, wherein the controllable device comprises the other device.

20. The method of claim 12, wherein the measured distance to the source of the voice input is utilized when determining the voice level of the received voice input.

21. The method of claim 14, wherein the measured distance to the source of the voice input is utilized when determining the voice level of the received voice input.

Background

Dedicated remote controls capable of controlling the operation of a consumer electronic device for receiving speech input by activating a microphone using keys are well known in the art, and typically each remote control is capable of controlling the operation of one type of consumer electronic device. Such a dedicated remote control is intended to mainly control the functional operation of the consumer electronic device associated with the brand name of the consumer. For example, a dedicated Apple brand Siri remote control for an Apple TV brand video streamer (streamer) may control volume, play, pause, rewind, stop, etc. operations using voice input using a proprietary communication protocol. These dedicated remote controls are typically only effective for associated products that are bundled together and cannot be used with other voice-controlled consumer electronic devices. It is therefore desirable to provide a control device that allows a user to control a plurality of different types of voice-controlled consumer electronic devices from a plurality of consumer brand names using voice commands.

Disclosure of Invention

Described below are examples of apparatus, systems and methods that allow the use of voice commands to control the operation of a plurality of different voice-controlled consumer electronic devices, and more particularly, control devices are provided for routing signals to two or more voice-controlled consumer electronic devices (hereinafter "smart appliances") from different consumer brand names based on voice input from a user.

In one example, a control apparatus is used to provide formatted voice data to two or more intelligent devices. The functional components of the control device include: an electronic storage medium having processor readable code embodied therein for storing a plurality of device profiles, wherein each device profile includes a formatting protocol for formatting voice commands received from a user in accordance with a protocol used by a particular smart device; a first communication interface for sending formatted voice commands to at least one of the two or more smart devices; a microphone for receiving a voice input from a user; a processor coupled to the electronic storage medium, the communication interface, and the microphone for executing the processor-readable code.

In operation, the processor readable code causes the processor of the control device to: receiving a first voice command through a microphone; the processor is configured to determine a first smart device to which the first voice command is directed; identifying a first formatting protocol in an electronic storage medium associated with a first smart device; formatting the voice command into a formatted voice command according to a first formatting protocol; and transmitting the formatted voice command to the first smart device through the communication interface.

Also described is a method for providing formatted voice data to two or more smart devices, the method being performed by a controlling device working in cooperation with the smart devices, wherein a processor of the controlling device receives a first voice command from a user through a microphone. The processor then determines a first smart device to which the voice command is directed and identifies a first formatting protocol stored in the electronic storage medium and associated with the first smart device. The processor then formats the received voice command in accordance with a first formatting protocol and sends the formatted voice command to the first smart device using the communication interface. Similarly, the processor may identify a second smart device and send a formatted voice command to the smart device, where in some embodiments the protocol is a proprietary protocol, a VoIP protocol, or the like.

Also described is a method for providing formatted voice data to two or more smart devices, the method being performed by a control apparatus working in cooperation with the smart devices, wherein a processor of the control apparatus receives, via a communication interface, an identification of one or more smart devices from the smart devices coupled to the control apparatus. In the method, the device identification may include a predetermined wake-up word associated with the smart device identification. The processor of the control device may store the smart appliance identification in a coupled electronic storage medium, and the processor may receive at least a wake-up word and a voice command from the user by using a microphone coupled to the processor of the control device. The processor then determines a smart device identification, stored in the electronic storage medium, that corresponds to the received wake word and the predetermined wake word, where the predetermined and received wake words match. The processor then sends the voice command to the target smart device using the communication interface. In some implementations, the wake word may be an alphanumeric consumer brand name, an alphanumeric code, a user command, etc., as desired for a particular application.

Another method for providing formatted voice data to two or more intelligent devices is described that is performed by a control apparatus working in cooperation with the intelligent devices. The method comprises the following steps: receiving, by a processor of a control device, a voice command from a user via a microphone; in response to receiving the voice command, sending, by the processor of the control apparatus, an HDMI input status request to the connected smart device via the communication interface; receiving, by the processor of the smart device, an HDMI input status request via the communication interface of the smart device, thereby causing the processor of the smart device to detect an active HDMI input, the active HDMI input comprising a signal from the device currently presented by the smart device, determine a device identification associated with the active HDMI input, send the smart device identification to the control apparatus via the communication interface of the smart device, at which time the processor of the control apparatus receives the smart device identification via the communication interface of the control apparatus and formats the voice command according to a formatting protocol stored in an electronic storage medium of the control apparatus associated with the device identification.

A system for providing formatted voice data to two or more intelligent devices is also described, the system being performed by an intelligent device in cooperation with a coupled remote server and control apparatus. The system may include a smart device having processor readable code that causes the smart device to: receiving a first voice command from a control device through a communication interface; formatting the voice command according to a first formatting protocol; sending the formatted voice command to a remote server via a communication interface, wherein a processor of the remote server receives the formatted voice command via the communication interface and uses the voice command to determine a first device to which the first voice command is directed; the method further includes receiving, from the remote server via the communication interface, a determination of a first device for which the voice command is intended, and sending, via the communication interface, the formatted voice command to the intended smart device.

The objects, advantages, features, characteristics and relationships of the subject system and method can be better understood through the following detailed description and drawings, which give illustrative examples and indicate the various ways in which the principles of the invention claimed below may be employed.

Drawings

For a better understanding of various aspects of the described systems and methods, reference may be made to the illustrative examples shown in the drawings, in which:

FIG. 1 is a block diagram illustrating a prior art system for providing voice data to a smart device using a dedicated remote control.

FIG. 2 is a block diagram illustrating another system for providing formatted voice data to two or more intelligent devices using a control apparatus.

FIG. 3 is an illustrative example of yet another system for providing formatted voice data to two or more smart devices using a control apparatus and associated application.

FIG. 4 is a functional block diagram of the example control apparatus shown in FIG. 2 for providing formatted voice data to two or more smart devices.

FIG. 5 is a functional block diagram of the example smart device shown in FIG. 2, which is used to provide formatted voice data to two or more smart devices.

FIG. 6 is a flow diagram illustrating an example method for implementing voice control of two or more smart devices.

FIG. 7 is a flow diagram illustrating an example method for controlling a sound level produced by a device in an environment associated with receiving speech input.

FIG. 8 is a flow diagram illustrating an example method for controlling device sound input based on received speech levels.

Detailed Description

Described below are examples of devices, systems, and methods for controlling two or more smart devices using voice commands, and more particularly, to a control apparatus for receiving voice input from a user to identify a particular smart device for which the voice input is intended and for formatting the voice input into digital signals understandable by the particular smart device.

Fig. 1 illustrates a system known in the art, wherein a dedicated remote control is operated to control the functional operation of a first smart device having the same consumer brand as the dedicated remote control, primarily by using voice commands, and the dedicated remote control is not intended to communicate with or control a second smart device of a different consumer brand by using voice commands. The present example shows two dedicated remote controls, namely a Comcast brand Xfine voice remote control 102 and an Apple brand Siri voice remote control 104, which are used to operate associated intelligent devices, such as a Comcast brand set-top box 108 (hereinafter "STB") and an Apple TV brand streamer 110, respectively, by using voice commands. Some consumer devices are internet-enabled in that they can send and receive content to and from sources located within range of a local data network (e.g., a wireless LAN) and/or send and receive content to and from sources located at remote locations over the internet. Each dedicated remote control communicates with an associated smart device via wireless signals 120 and 122, respectively, where the wireless signals 120 and 122 are different from each other. Typically, STB108 and Apple TV brand streamer 110 are connected to smart TV106 (hereinafter "TV") over HDMI cable 112, and may also be connected to wireless router 114, and may use signals 118 to communicate with an internet cloud based voice processing service 116 (e.g., Comcast brand voice service provider or Apple brand simple voice service) to send voice commands received by a dedicated remote control to the same brand of smart device, which sends voice data to an associated voice processing service for interpretation. For example, a user of the Comcast brand Xdefinition voice remote control 102 presses a microphone key, causing the remote control to start recording the user's voice, e.g., "watch ESPN" or "show me a children's movie", and the recording ends when the key is released. The remote control then compresses the recording using the voice-based RF4CE wireless protocol and transmits it to STB108 over the low bandwidth link. STB108 then sends the recording to the Comcast brand voice service provider through router 114 to perform Natural Language Processing (NLP) to interpret the recording, determine a corresponding command, and then sends the command back to STB108 to perform the corresponding operation. In this embodiment, the Comcast brand of Xfine voice remote control 102 does not support sending voice commands to different consumer brands of intelligent devices, such as Apple TV brand streamer 110. The operation of cloud-based voice processing services is well known to those skilled in the art and will not be described in detail herein.

Fig. 2 illustrates an exemplary system according to the teachings herein, and includes a control device 202, such as a standalone Amazon brand Echo device (hereinafter "Echo") or similar type of device with voice input capabilities, which may include, but is not limited to, a tablet, a PDA, a cell phone with an associated remote control type application, a smart watch, a computer, a wearable control device, a remote control, or an intermediate device intended to control two or more smart appliances. Additionally, it is contemplated that the control device 202 may be in the form of a smart digital assistant as described in U.S. application No. 15/799,393, which is incorporated herein by reference, wherein the digital assistant will be supplemented with one or more of the functions described herein. As will be appreciated from the following description, the control device 202 is adapted to transmit one or more different wireless signals, such as signals 120, 122 (also shown in fig. 1) and/or signals 212 and 214, for receipt by a corresponding plurality of intended target devices. Such transmissions may use communication protocols such as voice over IP (VoIP), IP, smart WAVE (S-WAVE), Wi-Fi, Bluetooth Low Energy (BLE), RF4CE, ZigBee, Z-WAVE (Z-WAVE), infrared, 6LoWPAN, Thread, Wi-Fi-ah, 2G, 3G, 4G, NB-IoT, 5G, NFC, RFID, SigFox, etc. as needed to communicate commands to two or more smart devices. In this embodiment, the control apparatus 202 is configured to receive a voice command from a user and to send a formatted version of the voice command to one or more of the smart devices 108, 110, 204, and 206 determined by the control apparatus 202. In some implementations, the voice command is not formatted. It should also be understood that in addition to having the capability to transmit formatted versions of voice commands as described herein, the control device 202 may be equipped with a conventional remote control function by which one or more commands selected from one or more command code sets may be transmitted to the controlled device to thereby control the functional operations of the controlled device, such as a volume operation function and a power supply operation function, etc.

The smart devices may include consumer electronics devices in the end user's home, such as TV106, STB108, Apple TV brand streamer 110 (both shown in FIG. 1), Xbox brand gaming system 204, and Roku brand streamer 206. Although shown as TV106, Xbox brand gaming system 204, STB108, Roku brand streamer 206, and Apple TV brand streamer 110, it is understood that the smart devices may include, but are not limited to, various televisions, VCRs, DVRs, DVD players, cable or satellite converter set top boxes, amplifiers, CD players, game consoles, home lighting, smart wireless hubs, curtains, fans, HVAC systems, personal computers, wearable health monitoring equipment, or generally any consumer product capable of communicating with a control-type device (e.g., an Echo or smart phone) and/or other smart devices in some embodiments using a wireless router 114 typically associated with a wireless signal 118 over a local or wide area network 216. Such smart devices are typically connected to the TV106 through an HDMI cable 112, or may be wirelessly connected, and may operate to transmit data to a coupled remote server 208 coupled to a database 210 and/or to the speech processing service 116 shown in fig. 1.

The user may operate the control apparatus 202 by pressing a soft key or mechanical key on the control apparatus 202 that activates at least the coupled microphone, allowing the user's voice to be recorded and/or streamed and transmitted to one or more coupled smart devices (hereinafter referred to individually or collectively as "devices"). In one embodiment, the control apparatus 202 may be part of an audio-based context recognition system, and in some embodiments may be part of a context command routing system that includes at least one smart device coupled to the server 208, wherein the system determines the user's intent to perform an action and determines the desired smart device to which to route the voice command to execute the command. In some implementations, the determination of the user's intent is performed by a home entertainment system that includes a coupled sensing interface to automate the response of the system to events occurring in a media viewing area, such as the user's living room. Such a determination of user intent may be performed in the manner described in U.S. patent No. 9,137,570, which is hereby incorporated by reference in its entirety.

In one example, the control device 202 may operate by: continuously listening for an audio-based context (i.e., a context based on an audio signal generated by a user speaking a voice command) and sending the audio-based context (hereinafter "voice command") via a communication interface to a smart device that sends the voice command to a coupled server 208, the server 208 automatically performing an audio-based context recognition operation to automatically determine a context command routing and/or determine at least a portion of a classification to determine the smart device for which the audio-based context is directed.

In this example, a smart device, such as the TV106, coupled to the server 208 receives the intended device determination information directly from the server 208 via a first communication interface (e.g., a Wi-Fi receiver) and uses the intended device determination information to determine the smart device to which the voice command is directed. The TV106 sends voice commands to the identified smart device via a second communication interface (e.g., RF4CE transmitter) to execute commands (e.g., turn on, turn off, increase volume, decrease volume, change channel to channel X, etc.).

In another example, the control apparatus 202 receives the intended device determination information from the server 208 through the first communication interface and sends a voice command to the identified smart device through the second communication interface, executing the command on the smart device.

In yet another example, the server 208 determines not only the intended device determination information, but also the user's intent to determine contextual command routing. The command itself is then transmitted over the wide area network 216 to the target smart device, to the controller 202, or to the smart device that forwarded the voice command.

In one example, the first smart device receives the intended device determination information from the server 208 via the first communication interface, uses the device determination information to determine the intended smart device and sends a command to the intended smart device to execute the command.

In one example, the first smart device receives expected device determination information from the server 208 for locally executing the command.

In another example, a first smart device may scan for connected smart devices in a local area network and may query each smart device for status information for determining intended device determination information and sending commands to the intended smart devices.

In another example, the first smart device receives the intended device determination information from the server 208 and sends the device determination information to the second smart device, wherein the second smart device uses the identification information to determine the identified smart device and sends a voice command to the identified smart device via the second communication interface for executing the command.

In one example, the smart device sends voice commands to an associated voice processing service provider to perform natural language processing or the like to determine a corresponding command, which is then sent to the smart device to perform the command operation.

In one example, the control device 202 records and/or streams a wake up word such as "TV", "Google", "Alexa", "Xbox", "Game", or "STB" and commands such as "open", "play", "stop", or the like, to the smart appliance via the communication interface. The wake word is generally intended to identify the smart device and, in some embodiments, to alter the power state of the smart device, such as from standby to full load. In one example, the control apparatus 202 uses the wake word to determine which smart device to send the wake word to, and in one example, the control apparatus 202 receives the command immediately after receiving the wake word.

In another example, the control apparatus 202 sends the wake word and command to the server 208 via the wide area network 216, wherein the smart device identification is determined by a processor of the server 208, and wherein the server 208 sends the voice command to that smart device.

In another example, the control apparatus 202 receives an identification of an intended smart device from a smart device coupled with the control apparatus 202, wherein the identification of the intended smart device includes an associated wake word, and the control apparatus 202 stores the information in an electronic storage medium. The control apparatus 202 then receives at least one wake word from the user and uses the wake word to determine the intended smart device to send the wake word or a voice command associated with the smart device associated with the wake word.

As an example, the control device 202 may send at least a wake-up word to the TV 106. The TV106 uses the wake-up word to determine the smart device identification associated with the received wake-up word. The TV106 uses the smart device identification to determine the corresponding smart device for which the wake-up word is intended. The TV106 then sends a wake up word and associated voice command to the identified smart device to execute the command.

In another example, smart devices operating in conjunction with the control apparatus 202 and the server 208 are configured during a setup process to register each smart device located in the user's home and detected by the smart device with a predetermined voice command spoken by the user during the setup process (e.g., a learning operation) and associating the voice command with the smart device supporting the voice command. For example, the user can issue predetermined voice commands such as "play music", "pause a movie", "start recording", and the like by using the control apparatus 202. In this embodiment, the control apparatus 202 sends a voice command to the smart device for configuration, wherein the smart device receives from the server 208 an instruction corresponding to the voice command and an identification of the intended smart device for the command using a database 210 accessible to the server 208.

For example, the voice command "play music" may be associated with a smart device that supports streaming music (e.g., detected Apple TV brand streamer 110) through server 208. Similarly, the voice command "pause movie" may be associated with Roku brand streamer 206 via server 208 and "start recording" may be associated with STB 108. Thereafter, when the user speaks the voice command "play music," the intelligent device is set to cause Apple TV brand streamer 110 to perform the operation of streaming music using control apparatus 202.

In yet another example, the control apparatus 202 receives a voice command that will automatically cause an input status request (e.g., a request to detect an active source/sink port, a request for a communication bus status, etc.) to be sent by the control apparatus 202 to the TV106, wherein the TV106 performs operations to detect an active input from a plurality of possible inputs to determine an associated device identification selected from a plurality of device identifications stored in an electronic storage medium of the TV 106. The TV106 then sends voice commands to the identified smart device using the device identification.

In one example, the control apparatus 202 receives configuration information from the TV106 coupled with the remote server 208 using an identification of a smart device from a plurality of coupled smart devices located in the end user's home. In this example, the configuration information includes a plurality of smart device profiles (hereinafter "device profiles") provided by the server 208 to the TV 106. For example, the server 208 receives from the TV106 an identification of a plurality of smart devices located in the end user's home, where the TV106 operates to detect other connected smart devices and provide this information to the remote server 208. The server 208, in turn, analyzes the information to determine a device profile for each detected smart device. The server 208 stores the device profile for each detected smart device in the database 210 and sends the device profile for each detected smart device to the TV106, which then sends the configuration to the control apparatus 202. The control device 202 may receive configuration information from a coupled TV106 that includes a universal control engine 200 (hereinafter "UCE") via a setup process, which is described in other examples below.

Further, any of the intelligent appliances shown in FIG. 2 may operate in a coordinated manner, such as by using any intelligent appliance as a master and the server 208 as a slave, or vice versa, to send one or more device profiles to the control device 202 or another coupled intelligent appliance. The device profile may be stored locally in an electronic storage medium associated with the control device 202 or in an electronic storage medium of the smart appliance.

It should be understood that although the user's voice commands are described as being recorded, the voice commands may be streamed by the control device 202 in real-time, may be partially streamed, or may be temporarily stored in an electronic storage medium of the control device 202. Further, while the determination operation is described as a cross-reference operation, it should be understood that server 208 may perform other methods to determine relationships, such as using a predetermined operation mapping, using an index, using a pairing table, and may use one or more methods.

Fig. 3 is an example of a system using a control apparatus 202, the control apparatus 202 having an associated control application and coupled to a server 208 for providing voice commands to two or more smart devices. Control-type applications (hereinafter "applications") are well known in the art and therefore will not be described herein. In this embodiment, the control apparatus 202 may initiate operations by using an application program with an appropriate application interface 300, wherein the control apparatus 202 may determine to which smart device each voice command is directed, may format the voice commands according to the smart device to which the voice command is directed, and may determine the transmission technology that will send the formatted voice command to the intended smart device. For example, the user may press a soft key set on the user interface of the control device 202, activating the microphone of the control device 202. The user may then speak a voice command, which is received by the control apparatus 202 and then processed to determine which smart device the voice command is directed to. Next, the voice command is formatted into a digital signal that certain smart devices can understand. It should also be understood that activation of a volume control key, channel control key or power key on the control apparatus 202 depicted in fig. 3 may result in transmission of a conventional remote control command to the controlled device to cause the controlled device, e.g., a TV, to perform a corresponding functional operation, e.g., mute its sound.

In one example, an application may listen to voice commands by using an associated microphone, and when a voice command is received, the application sends a request to the smart devices to perform local operations to dynamically scan the local area network for connected smart devices, query each smart device for status information, such as media content currently available on a particular smart device, supported commands, and the like. For example, the TV106 may initiate a query to one or more smart devices (e.g., STB108 and Apple TV brand streamer 110), where each smart device transmits information to the TV106 in real-time regarding activities performed on each smart device. Such activity information may include current media content available, such as a television program or movie being viewed on apple tv brand streamer 110, a photograph being viewed, an active application and its content displayed on STB108, a current volume level being used, supported commands, and possibly information such as an identification of the last user operation or command performed by each smart device. In some examples, the activity information may be partially or fully displayed on a display coupled with the smart device, or may be provided by the first smart device to the second smart device to display the information.

In another example, the activity information may be displayed in a display coupled to the control device 202, wherein the activity information contains an activatable link that, when activated by a user using an application installed on the control device 202, causes the smart device to execute a corresponding command, such as "play", "stop", or the like.

In one example, after the control device 202 sends the voice command to the dedicated smart appliance whose task is to relay the voice command from the control device 202 to the voice processing service or cloud service, the voice processing service 116 or cloud server 302 performs the determination of the smart appliance for which the voice command is intended. The identification and/or other information of the smart device for which the voice command is intended is then received back at the same smart device, which then provides the identification and/or other information to the application. Alternatively, the application may send the voice command directly to the voice processing service 116 or associated cloud service 302 through the wireless router 114 or directly using a cellular network, thereby eliminating the need for the smart device to relay this information to and/or from a remote server. The voice processing service 116 or the cloud service 302 may then send the information/instructions directly back to the control device 202.

In one example, the application may include instructions that may be used to provide cloud services 302, such as an "if so" (hereinafter "IFTTT") type of instruction to automate one or more predefined IFTTT operations that cause the IFTTT service to send one or more predefined operations to one or more smart devices (e.g., TV106, which is coupled to the IFTTT service through UCE 200). Similarly, such operations may be pre-populated at the cloud service 302 by using workflow tools, or may be populated to the IFTTT service by an application during a setup operation.

In one example, an application sends requests to smart devices, either continuously or at predetermined time intervals, to scan for connected smart devices in a local area network and query each smart device for status information.

It should be appreciated that although described as a standalone application, one or more coupled applications that may be installed on one or more smart devices may cooperate to set up the control apparatus 202, cloud service 302, or TV106 to provide formatted voice commands to two or more smart devices. In addition, one or more applications may cooperate to respond to requests issued by the smart devices or the control apparatus 202 to scan a local area network for connected smart devices and to query each smart device for status information. In some examples, these applications may be synchronized through the use of a setup agent resident in the smart device or control apparatus 202. Further details may be found in U.S. application serial No. 14/277,968, which is incorporated herein by reference in its entirety.

FIG. 4 illustrates a functional block diagram 400 of one example of a control apparatus (e.g., control apparatus 202 shown in FIG. 2) for providing formatted voice commands to two or more smart devices. In this example, the control device 202 includes a processor 402, an electronic storage medium 404, a communication interface 406, a user interface 408, at least one transceiver 410, and at least one transmitter 412.

The processor 402 may be configured to provide general operation of the control device by executing processor-executable instructions (e.g., executable code) stored in the electronic storage medium 404. The processor 402 typically comprises a general-purpose microprocessor, although any of a variety of microprocessors, microcomputers, and/or microcontrollers may alternatively be used, depending on factors such as computing power, cost, size, and the like.

Electronic storage media 404 includes one or more information storage devices, such as ROM, RAM, flash memory, other types of electronic, optical, or mechatronic storage media devices, or any combination thereof. The electronic storage medium 404 may be used to store instructions executable by a processor to operate the control device 202. It will also be appreciated that some or all of the illustrated electronic storage media may be physically incorporated in the same IC chip as the processor device 402.

As will be understood by those skilled in the art, some or all of the electronic storage media 404 may store a plurality of device profiles, where each device profile includes a formatting protocol for formatting voice commands in accordance with the protocol used by a particular smart device, and may store a plurality of wake words and/or voice commands associated with one or more device profiles. For example, a first device profile may specify a format of one or more digital signals for voice operation of TV106 (e.g., causing TV106 to change channels, inputs, volume, etc.), while a second device profile may specify a format of one or more digital signals for voice operation of STB108 (e.g., changing channels, controlling volume, etc.).

The communication interface 406 includes one or more data interface circuits, such as well-known ethernet, Wi-Fi, RF4CE, bluetooth, or USB circuits, that allow wireless communication between the control device 202 and the smart appliance, and in some embodiments, between the control device 202 and the wireless router 114 and server 208 in communication therewith via the wide area network 216. In one embodiment, the communication interface 406 includes one or more data interface circuits, such as at least one transceiver 410 and at least one transmitter 412, that allow communication between coupled smart devices. In this embodiment, the transceiver 410 may support a first wireless protocol for communicating with a first smart device and the second transceiver 410 may support a second wireless protocol for communicating with a second smart device to provide formatted voice data to each smart device.

The user interface 408 includes a user input device for allowing a user to control the operation of the control device 202. The user input typically includes at least one or more soft keys or mechanical keys for allowing a user to enter commands or information into the control device 202. In one example, the user interface 408 includes a microphone coupled to the processor 402 for receiving voice commands of a user and converting the voice commands into electronic signals, as is well known in the art.

It should be understood that the functional blocks may be coupled to each other in various ways other than that shown in fig. 4, and that, for clarity, not all of the functional blocks necessary to control the operation of device 202 are shown, such as a power supply, a microphone, one or more accelerometers, a multi-axis gyroscope, various other transceivers and transmitters each including different wireless protocols.

Fig. 5 shows a functional block diagram 500 of one example of a smart device, such as TV106, STB108, Apple TV brand streamer 110 (each shown in fig. 1), Xbox brand gaming system 204, and Roku brand streamer 206 (each shown in fig. 2). Such a smart device may be controlled by voice commands, may be speaker independent, i.e. the smart device may respond to multiple voices, and may respond to multiple commands at once. In some implementations, the smart device may identify and/or authenticate the speaker (i.e., user) via local operations, and may send the received voice input to the voice processing service 116 via the wide area network 216 or by using a cellular network.

In this example, the smart device includes a processor 502, an electronic storage medium 504, a communication interface 506, a user interface 508, and a transceiver 510. It should be understood that the functional blocks may be coupled to each other in a variety of ways other than that shown in fig. 5, and that, for clarity, not all of the functional blocks required for operation of the smart device are shown, such as a power supply, various other transceivers and transmitters each including a different wireless protocol.

The processor 502 is configured to provide general operation of the smart device by executing processor-executable instructions (e.g., executable code) stored in the electronic storage medium 504. The processor 502 typically includes a general purpose microprocessor, such as an Intel core I7 brand or AMD K10 brand microprocessor, although any of a variety of microprocessors, microcomputers, and/or microcontrollers may alternatively be used, as selected based on factors such as computing power, cost, size, and the like.

Electronic storage media 504 includes one or more information storage devices, such as ROM, RAM, flash memory, other types of electronic, optical, or mechatronic storage media devices, or any combination thereof. The electronic storage medium 504 may be used to store processor-executable instructions for operation of the smart device. It should also be understood that some or all of the illustrated electronic storage media may be physically incorporated within the same IC chip as processor device 502.

As will be appreciated by those skilled in the art, some or all of the electronic storage media 504 may store instructions or data specific to each type of smart device to be controlled. For example, the instructions for the TV106 may include instructions to receive television programs via the communication interface 506 and display one of the television programs on the display according to commands received from the control device 202.

Other instructions cause the smart device to receive instructions, such as a wake word or voice command, from the control apparatus 202, where the processor 502 uses the voice command to determine a smart device identification associated with the voice command. The smart device then transmits the device identification to the control apparatus 202 or coupled smart device over the wide area network 216.

Still other instructions cause the smart device to receive instructions from the control apparatus 202 that cause the processor 502 to initiate a detection process, such as detecting a request/communication bus status request for an active source/sink port on an audio-video, thereby detecting a valid HDMI input. The smart device then determines the smart device connected to the active HDMI input and sends the device identification to the control apparatus 202 or to the coupled smart device. The control device 202 or coupled smart appliance then uses the device identification to send a voice command to the identified smart appliance. In some examples, the smart device sends the device determination to the server 208 for determining the context command route. The server 208 then sends the voice command to the identified smart device. In another example, the server 208 sends a voice command to the smart devices that determines which smart device is connected to the active HDMI input to execute or forward the command to the device connected to the active HDMI input.

The communication interface 506 includes one or more data interface circuits, such as a transceiver 510, an ethernet, Wi-Fi, RF4CE, bluetooth, or USB circuit, that allow digital communication between the smart device and other coupled smart devices, between the smart device and the control apparatus 202 via a local area network provided by the wireless router 114, and between the smart device and the server 208 via the wide area network 216. In this embodiment, the transceiver 510 may support a wireless protocol for receiving voice commands from the control device 202, and the transceiver 510 may decode, compress, or perform other operations necessary to send the voice commands to the voice processing service 116.

The user interface 508 includes user input devices and/or user output devices for allowing a user to control the operation of the smart device. The user input typically includes one or more buttons, keys, a touch screen display, etc. for allowing a user to enter commands or information into the smart device. User output devices typically include display screens, touch screen displays, lights, enhanced echo walls, etc., for presenting media content to a user as desired/needed.

It should be understood that the functional blocks may be coupled to each other in various ways other than that shown in fig. 5, and that, for clarity, not all of the functional blocks necessary for operation of the smart device are shown, such as a power supply, various other transceivers and transmitters, each including different wireless protocols.

FIG. 6 is a flow diagram of an example method for implementing voice control of two or more smart devices. The method is implemented by a processor 402 located within the control device 202 by executing processor-executable instructions stored in an electronic storage medium 404. It should be understood that in some examples, not all of the steps shown in fig. 6 are performed, and the order in which the steps are performed may be different. It should also be understood that some minor method steps known to those of ordinary skill in the art have been omitted for clarity.

At block 600, a user of the control device 202 speaks a voice command to the control device 202 through the user interface 408. In one example, a user first presses a key on the control device 202 to activate a microphone on the control device 202.

At block 602, a voice command is received by the processor 402 via the user interface 408, and the voice command is typically stored in the electronic storage medium 404.

At block 604, the processor 402 determines to which smart device the voice command is intended. In one example, the processor 402 evaluates a voice command and determines that the voice command is intended for a particular smart device, in this example the TV 106. The determining operation is performed according to one or more of the above examples.

In another example, the processor 402 sends the voice command in a predetermined format for receipt by a predetermined one of the smart devices. In this example, the processor 402 is pre-configured to communicate with one of the smart devices and transmit voice commands in a format understood by the predetermined smart device. The predetermined smart device may be different from the smart device to which the voice command is directed. The predetermined smart device receives the voice command and then forwards it to the remote server 208. The remote server 208, in turn, processes the voice command to determine the smart device type or identification of the particular smart device to which the voice command is directed. For example, the server 208 may interpret the voice command and extract one of a plurality of predetermined commands, such as "increase volume", "decrease volume", "change channel", "television on (off)", "Roku on (off)", and the like. Based on the interpretation, the server 208 identifies at least the smart device type to which the voice command is directed. For some voice commands, such as "television on (off)", "Roku on (off)", where the identification of a particular smart device is included in the voice command, the determination of the target smart device is simply a matter of interpreting the voice command to extract the smart device in question. In other voice commands, such as "increase volume", "decrease volume", or "change channel", the server 208 may identify keywords in the voice command and determine a likely intended smart device by associating the keywords with the type of smart device stored by the server 208. For example, if the server 208 determines that the word "volume" is spoken, the server 208 may determine that the voice command is for the TV106 or the set-top box 110. The server 208 then returns the identity of the intended smart device to the remote control apparatus 202, either directly via the wireless router 114 or via a predetermined smart device.

At block 606, the control apparatus 202 receives an identification of the intended smart device from the server 208 via the transceiver 410, and the transceiver 410 provides the identification to the processor 402.

At block 608, the processor 202 receives the identification and may determine the particular smart device to which the voice command is directed based on the smart device identification type. For example, the server 208 may have identified a television for which the voice command is intended. The processor 402 then determines the particular brand and/or model of television being used, as well as the operational capabilities such as voice commands, wake-up words, pre-installed applications, content being viewed, supported wireless protocols, user preferences, etc., based on earlier setup procedures.

The setup process includes sending a signal to the discovered smart device causing the other smart devices to send their device information (e.g., EDID, CEC, vendor name, device type, device status, installed applications, current media content being played on the device, media content logo, information frame, SSDP, MDNC, IP mDNS service list, supported wireless protocols (e.g., VoIP, IP, Smart WAVE (S-WAVE), Wi-Fi, Bluetooth Low Energy (BLE), RF4CE, ZigBee, Z-WAVE, infrared), etc.) to the requesting smart device or devices. For example, the setup process may be used to determine intelligent devices that include the same operational capabilities. For such devices, a user may prefer to watch television programs on a particular smart device and set the user preferences accordingly. In this embodiment, the user preferences are aggregated to a device profile. More details of the detection of such devices can be found in U.S. patent nos. 8,812,629, 8,558,676, 8,659,400, 8,830,074, 8,896,413, 9,215,394, 9,437,105, 9,449,500, and 9,019,435, all of which are incorporated herein by reference in their entirety.

At block 610, the processor 402 formats the voice command stored in the electronic storage medium 404 into a data format according to the formatting protocol associated with the identified smart device.

At block 612, the processor 402 sends the formatted voice command to the identified smart device via the transmitter/transceiver 410 and/or 412. The formatting protocol used to format the voice commands may additionally include the transport protocol in which the data is to be sent. For example, a formatting protocol stored within the electronic storage medium 404 associated with the TV106 may indicate that wireless data needs to be sent via the RF4CE transmitter. In this case, the processor 402 routes the formatted voice commands to the RF4CE transmitter and causes the RF4CE transmitter to send the formatted voice commands to the TV 106.

As described in part in FIG. 2, the equipment profile includes overall smart device information for a plurality of smart devices, such as data identified during a setup process performed by the smart devices, which typically includes metadata, attributes and user-set preferences of the smart devices, formatting protocols used to format voice commands according to the protocol used by a particular smart device, supported network or communication protocols, voice command code structures or formats, voice services or operational capabilities, status information, etc., which may be stored in database 210 and accessed by server 208.

These commands and operating functions define a set of verbs and grammars that can be associated with this device.

In one example, in response to receiving the device profile from the server 208, the TV106 can transmit the device profile to the second coupled smart appliance via the communication interface 506 for controlling the configuration of the device 202, and wherein the device profile is in a format used by the second smart appliance. For example, a first smart appliance may send a device profile received from the server 208 to a second smart appliance acting as an intermediary, or may send one or more device profiles to the control device 202 for storage in the electronic storage medium 404.

In yet another example, the cloud service 302 provides the device profile to the control device 202 using the wide area network 216. The control device 202 may then store the device configuration file locally in the electronic storage medium 404.

In one example, the device profile may be received by the processor 402 in a raw format and the device profile may be reconstructed by the processor 402 into a particular data structure by executing readable code comprising a set of steps for creating the data structure. In one example, the data structure is an array. In another example, the data structure is a list. In yet another example, the data structure is a combination of one or more data types needed by the processor 402 to perform data reconstruction operations.

In another example, the processor 402 may perform local operations to cross-reference the discovered device information with device metadata stored in the electronic storage medium 404, or by cross-referencing operations performed in cooperation with the server 208 for each identified smart device. In this example, the server 208 determines information, in some aspects equal or similar to the smart device-like appliance information, by using the database 210 for generating or aggregating data to a device profile or device fingerprint. The device metadata includes smart device attributes, such as EDID, CEC, device type, supported features, etc., which may be complementary to the discovered device information and typically includes a plurality of other device related information for a plurality of smart devices, such as cloud based device services, e.g., services provided by the manufacturer of the device, associated voice processing services, functions, preferred communication methods, supported network or communication protocols, command code structures or formats, etc.

Additionally, the device profile may include signals having the same structure as signals sent by an original remote control of the smart device (e.g., an Xfinity voice remote control 102 of Comcast brand for operating Comcast brand, STB108, or an apri voice remote control 104 of Apple brand for operating Apple TV brand streamer 110 (each as shown in fig. 1)) for sending voice commands to the smart device, and the signals may be sent by the control device 202 to the first coupled smart device through the communication interface 406 via signals similar to signals from the original remote control of the same smart device manufacturer. Similarly, the control apparatus 202 may transmit a different signal to the second smart device by using a signal similar to the signal of the original remote control associated with the second smart device.

In another example, the processor 402 may dynamically generate the device profile in real-time by using the discovered equipment information and/or by performing an online search via the communication interface 406 and via using the wide area network 216 to obtain relevant smart equipment metadata (e.g., from the internet or other cloud-based server). When the operation is complete, the device profile may be stored in the electronic storage medium 404, or may be stored in an electronic storage medium of a coupled server or cloud service 302.

In another example, the device profile is provided to TV106, including UCE200, by server 208 via communication interface 506, or to TV106 by cloud service 302 using wide area network 216.

In any example, each device profile includes a formatting protocol for formatting voice commands in accordance with the protocol used by a particular smart device, and the functions for collecting identification of smart device information from smart devices located in the end-user's home may be performed by a Universal Control Engine (UCE)200, as described in U.S. patent No. 9,215,394, which is incorporated herein by reference in its entirety. In one example, when a smart device including UCE200 is initially powered on, an automatic setup process may be initiated to identify or detect the smart device on the same local network as the smart device containing UCE 200. Alternatively, the setup process may be initiated by a key on the control apparatus 202, or by a voice command recognized by and acting on the smart device. Such a setup process is described in U.S. patent No. 9,307,178, also incorporated herein by reference in its entirety.

In one example, the control apparatus 202 includes an electronic storage medium 404 having processor readable code therein and storing a plurality of smart device profiles, wherein each device profile includes a formatting protocol for formatting voice commands in accordance with a protocol used by a particular smart device, wherein the server 208 provides the device profiles to the smart devices. In this example, the control apparatus 202 receives a first voice command from the end user via the microphone, which is used by the control apparatus 202 to determine the first smart device to which the first voice command is directed. The control apparatus 202 then identifies a first formatting protocol in an electronic storage medium associated with the first smart device, formats the voice command into a formatted voice command conforming to the first formatting protocol, and sends the formatted voice command to the first smart device.

For example, a user may press a microphone key and speak one or more words or sounds to select a particular smart device, such as Apple TV brand streamer 110. Control device 202 determines a device profile associated with the smart appliance based on the voice command and identifies the correct signal or protocol for communicating with Apple TV brand streamer 110. The control device 202 formats the voice command as the same voice command of Apple brand simple voice remote 104. Control device 202 then sends the voice command to Apple TV brand streamer 110. The control apparatus 202 receives the second voice command and similarly determines a second smart device (e.g., STB 108) to which the second voice command is intended. The control device 202 then transmits the formatted voice data to the STB 108. In some examples, the device profile includes a definition of how to send voice to the smart appliance, and the control device 202 may perform local operations to determine which smart appliance is associated with a voice command, and may determine one or more methods of how to send the voice command.

In one example, the control apparatus 202 may listen to a voice command, and when receiving the voice command, the control apparatus 202 transmits a request to the smart devices to perform a local operation to scan the smart devices connected in the local area network and dynamically query status information of each smart device. Such status information includes installed and/or supported applications, power status (i.e., on/off) of the smart device, current media status (e.g., playing a particular song or viewing a particular video stream), supported commands and/or scripts, etc. In some examples, the state information may be used by the intelligent device that executes the query or the server 208 for defining the context of the command.

In one example, the status information may be obtained by a first smart device performing a signal sniffing operation (e.g., an audio signal listening operation) to determine media currently being played at a particular location. In this example, the smart device contains the necessary hardware and programming to perform the signal sniffing operation. Signal sniffing operations are well known in the art and will not be described herein.

In one example, the smart device receives code or script for a connected smart device from a server 208 coupled to a database 210, wherein the database 210 includes a plurality of smart device codes and/or scripts for communicating with the connected smart device, and wherein the codes and/or scripts are used to identify a context and an intended smart device. For example, when the user says "pause," the server 208 will provide priority to the smart device that is currently playing a song, depending on the context.

For example, TV106 may initiate a query to one or more smart devices, such as STB108 and Apple TV brand streamer 110, where each smart device sends information to TV106 in real-time regarding activities being performed on each smart device. Such activity information may include current media content available, such as a television program or movie being viewed on Apple TV brand streamer 110, a photograph being viewed, an active application and its content displayed on STB108, supported commands, and may include information such as identifying the last user operation or command executed by each smart device. In some examples, the activity information may be partially or fully displayed on a display coupled with the smart device, or may be provided by the first smart device to the second smart device to display the information.

In another example, the activity information may be displayed in a display coupled to the control device 202, where the activity information contains an activatable link that, when activated by a user using an application installed on the control device 202, may cause the smart device to execute a corresponding command, such as "play", "stop", or the like.

In another example, the control apparatus 202 receives configuration information from the remote server 208 using an identification of a smart device from a plurality of coupled smart devices. For example, the server 208 receives from the TV106 an identification of a plurality of smart devices located in the end user's home. In this example, one or more smart devices perform operations to detect other connected smart devices and provide this information to the remote server 208. The server 208, in turn, analyzes the information to determine the device profile for each detected smart device. The server 208 stores the device profile for each detected smart device in the database 210 and sends the device profile for each detected smart device directly to the control apparatus 202. In some examples, the device profile includes one or more supported commands and operational capabilities defining a set of verbs and grammars associated with the smart device, and may include a second definition of how to send speech to the smart device.

In yet another example, control device 202 receives configuration information from cloud services 302 via wide area network 216. In this example, one or more smart devices perform operations to detect other connected smart devices and provide this information to cloud service 302. The cloud service 302 then analyzes the information to determine a device profile for each detected smart device. The cloud service 302 then sends the device profile for each detected smart device to the control device 202.

In another example, a device with a microphone receives and sends voice commands to the control device 202, such as Echo or similar type of smart appliance coupled to a voice-controlled smart personal assistant service (e.g., Amazon brand Alexa brand device), which itself serves as a home automation hub, and is communicatively coupled with the voice processing service 116, cloud service 302, or server 208. In this example, the control device 202 sends a voice command to Echo, mimicking the signal of an Amazon brand Alexa brand voice remote control. Echo sends voice commands to an Alexa brand service, wherein the Alexa brand service provides an IFTTT service in a collaborative process with the server 208 and the cloud service 302 (e.g., IFTTT) to automatically perform one or more predefined IFTTT operations that cause the one or more predefined operations to be sent by the IFTTT service to one or more smart devices (e.g., TV 106) coupled to the IFTTT service using UCE 200. The one or more operations are received by the TV106 via the communication interface 506 to be performed by the processor 502 for a particular operation. Such operations may be pre-populated at the cloud service 302, may be pre-populated in the control device 202 via an application associated with the IFTTT service, and/or may be populated during setup of the IFTTT service associated with the Alexa service and the TV 106. IFTTT services and operations are well known to those skilled in the art and therefore are not described herein.

For example, if the user says "Alexa, i want to play Xbox," the control device 202 determines from the voice command that the user wishes to use Echo and identifies the correct signal or protocol for communicating with Echo. Echo then sends the voice command to an Alexa brand service, which is coupled to cloud service 302 (e.g., IFTTT service) or similar service. The IFTTT service determines a predefined operation by using an applet or recipe (recipe) to provide the determined operation to the server 208 via the internet. The server 208 receives one or more operations from the IFTTT service via the internet for provision by the processor 502 to the TV 106. The TV106 receives the operation via the communication interface 506, and the processor 502 performs one or more of the received operations.

For example, the word "Alexa i wants to play Xbox" may be a predefined phrase or a series of predefined operations associated with the IFTTT service and with cloud service 302. Such operations may include automatically changing the HDMI input on the TV106, turning on the Xbox brand game system 204 power, setting the volume to a predetermined level, dimming the lights to a predetermined level, and so forth. One or more operations may take the form of one or more IFTTT applets, may be combined into a workflow to perform multiple operations simultaneously, or may be performed at predetermined intervals. It should be understood that, unless stated to the contrary, one or more of the operations described may be received and/or performed by the TV106, or may be received and/or performed by one or more coupled smart devices.

It should be understood that as shown in fig. 2, control device 202 may include a number of other functions, such as enabling the function of a motion sensor, gesture recognition, and may include the function of displaying images (e.g., logos, alphanumeric text, etc.). Such a control device may cooperate with one or more applications to control the smart appliance, as shown in fig. 3. Further, the control apparatus 202 may cooperate with one or more smart devices, each of which includes a computing client, e.g., in a client-server model, to configure and/or control the smart device. In some examples, microphones are coupled to one or more devices located in different rooms of a user's home, where the device with the microphone is coupled to the control device 202 via a local or wide area network 216 for sending voice commands to the control device 202.

In some cases, as shown in fig. 7 and 8, a speech processing service associated with the control device 202 (whether resident on the device itself, provided by an internet cloud-based processing service, or the like) may perform loudness analysis to determine the loudness of any speech being provided to the control device 202 and/or the loudness of the environment in which the control device 202 operates, i.e., the loudness of any background noise. As a non-limiting example, the loudness sensing component and functionality described in U.S. patent No. 9,847,096 may be used for this purpose. In this manner, a loudness analysis may be performed to determine a loudness estimate that indicates a level of the control speech input and/or the background noise received with the control speech input. Based on the loudness estimate, the system may determine that the control voice input is being provided loudly (e.g., the loudness estimate for the control voice input is greater than the predetermined voice input threshold), that the control voice input is being provided softly (e.g., the loudness estimate for the control voice input is less than the predetermined voice input threshold), that the control voice input is being provided normally (e.g., the loudness estimate for the control voice input is within the predetermined voice input threshold), that the environment is noisy (e.g., the loudness estimate for the environment is greater than a preset environment threshold), that the environment is quiet (e.g., the loudness estimate of the ambience is less than the predetermined ambience threshold) and/or the ambience is normal (e.g., the loudness estimate of the ambience is within the predetermined ambience threshold).

Using the loudness information so determined, it is contemplated that the control device 202 may be further adjusted to perform additional functions. For example, when the control device 202 includes a speaker for outputting information, playing music, etc. (as described in U.S. application No. 15/799,393), the control device 202 may use the loudness to determine the level to automatically adjust the output of information, music, etc. through the speaker. In this regard, when it is determined that the environment is noisy (or is becoming noisy), the control device 202 may automatically increase the level of information, music, etc. output via the speaker (preferably, whereby the DB output of the speaker rises slightly, so that the output may be correctly heard by the listener in view of the noise level in the environment); when it is determined that the environment is quiet (or is becoming quiet), the level of information, music, etc. output via the speaker may be reduced; when it is determined that the speaker has raised (or is raising) their voice, the level of information, music, etc. output via the speaker may be automatically raised; when it is determined that the speaker has lowered (or is lowering) their voice, the level of information, music, etc. output through the speaker may be automatically lowered.

It is also contemplated that the determined loudness information may be used by the system to automatically issue one or more commands to control the loudness level associated with the environment. For example, when it is determined that the environment is noisy (or is becoming noisy) and the user is attempting to provide speech to the control device 202, the system may attempt to reduce the noise generated by the environment by sending one or more commands (whether voice commands or conventional remote control commands) to one or more controlled devices. In this manner, upon determining that the environment is noisy (or is becoming noisy), the system may be used to automatically send commands to a sound source (e.g., a television) to mute, power down, or otherwise reduce the output volume level of the sound source. This control may be performed immediately after the control device 202 hears the device activation or trigger key, which has the advantage of reducing the noise generated by the environment before the user speaks the command, since the command input typically requires better sound quality for the system to understand the command. Further, the system may automatically determine the particular device controlled in this manner based on system state information. Thus, if the system knows that the television is currently in an on state based on the status information received from the system, the system can automatically select the television as the device to be controlled. Similarly, if the system knows that control device 202 was last used to turn on the television, control the volume of the television, etc., the system can infer (e.g., when status information cannot actually be received from a connected device) that the television is the device that produces sound, and can automatically select the television as the device to be controlled. As noted, the control of one or more selected devices may be performed by the system using any of the control steps already described herein, alone or in combination. It will also be appreciated that the controlled device may be automatically returned to a given state by the system (e.g., re-energized, un-muted, increased volume) if desired, when the system determines that the user is unlikely to reissue a voice communication (e.g., after a predetermined amount of time has elapsed since the last command was received, the last received command indicates a complete request, which means that no further voice input is expected within a given amount of time, etc.). Furthermore, if the device does not receive a recognizable command within a predetermined period of time after receiving the trigger command and immediately issuing a command to lower the sound level of the controllable device, the system may operate to automatically cause one or more other commands to be issued to lower the sound level of the controllable device until a voice input command is received and recognized, a time limit has expired (whereupon the controllable device may return to its original state), and so on.

It will also be appreciated that additional conditions may be utilized to adaptively change the manner in which the system responds to user commands and queries or otherwise outputs sound. For example, a contextual parameter (e.g., time of day) may be used to automatically recognize a mode (e.g., nighttime/sleep time), such that the output audio level of the control device 202 may be adjusted or inhibited from being adjusted accordingly. Also, different loudness thresholds may be established for different times of day, etc., for use as described above.

In other cases, the determined loudness level of the speaker and/or environment may also take into account a measure of the distance between the audio source and the control device 202 to provide improved flexibility in responding to commands and the like. For example, if the speaker's command does not sound loud, but the speaker is measured near the control device 202, the determined loudness level may indicate that the user is whispering. However, if the speaker command does not sound loud, but the speaker is measured to be far from the control device 202, the determined loudness level may indicate that the user is speaking normally or shouting loud. Thus, in such a case, the determined loudness level may be the sound level determined by the system as described above, increasing or decreasing depending on the measured distance to the audio source.

For use in adjusting the determined loudness level, absolute and/or relative distances may be utilized. To this end, a camera, image sensor, light sensor, etc. provided to the control device 202 may be utilized to accurately (or approximately) determine how far away from the control device 202 the speaker is located and/or whether the speaker has moved relatively closer to the control device 202 or further away from the control device 202 when the speaker speaks a command. Also, the measurement of distance can be done using a microphone array, which will consist of two or more microphones. Furthermore, a single microphone with some level of processing may also be used to estimate the distance of the speaker. Of course, other known means for measuring the distance between objects (such as those found in laser measuring devices and the like) may be provided to the control apparatus 202 for this purpose.

It should also be understood that the measured distance to the command speaker may also be used to adjust the loudness level of any output that may be generated by the control device 202 to ensure that the response is output at an appropriate level, e.g., at a level sufficient for the intended recipient at a distance to hear, or at a lower level so as not to be overwhelmed by the intended recipient at a close distance.

Although described as a microphone for receiving voice commands, it should be understood that a microphone includes any transducer type device that converts sound into electrical signals, and that one or more microphones may be included in each device, and each device may be coupled to each other, to the control device 202, and to the smart appliance.

Although described as an active HDMI input, it should be understood that an active input includes any active source/sink port/communication bus state on an audio-video that may be wired or wirelessly connected to the smart device that originated the status request.

It should also be understood that the control device 202 may be partially configured with one or more pre-installed device profiles at the factory. When initially powered on, the control device 202 may be configured to automatically communicate with a predetermined smart appliance, such as the STB108, for example, when the control device 202 and the STB108 are paired with each other after being out of the box. Similarly, when the end user presses a first key on the control device 202 to initiate communication with the STB108, an automatic pairing operation may be performed.

While various concepts have been described in detail, those skilled in the art will appreciate that various modifications and alternatives to those concepts may be developed in light of the overall teachings of the disclosure. Furthermore, although described in the context of functional modules and illustrated using block diagram format, it should be understood that, unless otherwise indicated, one or more of the described functions and/or features may be integrated within a single physical device and/or software module or one or more functions and/or features may be implemented within a separate physical device or software module. It should also be understood that a detailed discussion of the actual implementation of each module is not necessary in order to enable an understanding of the present invention. Rather, given the disclosure herein of the nature, function, and interrelationship of the various functional modules in the system, the actual implementation of such modules will be well within the routine skill of the engineer. Accordingly, those skilled in the art will be able to practice the invention set forth in the claims using no more than routine experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not restrictive on the broad invention, which is to be given the full breadth of the appended claims and any and all equivalents thereof.

All patents cited herein are incorporated by reference in their entirety.

27页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于人工智能驱动的自动伴侣的系统和方法

Apparatus, system and method for directing voice input in a control device

相关技术

网友询问留言