Method, device and computer-readable storage medium with instructions for processing a speech input, motor vehicle with speech processing and user terminal

文档序号：1246810 发布日期：2020-08-18 浏览：7次中文

阅读说明：本技术 用于处理语音输入的方法、设备和具有指令的计算机可读存储介质、具有语音处理的机动车和用户终端设备 (Method, device and computer-readable storage medium with instructions for processing a speech input, motor vehicle with speech processing and user terminal ) 是由 R.沃伊克于 2018-11-26 设计创作，主要内容包括：用于处理语音输入的方法、设备和具有指令的计算机可读存储介质。在第一步骤中,接收(10)用户的语音输入。紧接着,该语音输入针对两个或更多个可供使用的语音处理服务中的至少一个语音处理服务被预处理(11)。最后,将经预处理的语音输入传递(12)给这些可供使用的语音处理服务中的一个或多个语音处理服务。(Methods, apparatus, and computer-readable storage media having instructions for processing speech input. In a first step, a speech input of a user is received (10). Subsequently, the speech input is preprocessed (11) for at least one of the two or more available speech processing services. Finally, the pre-processed speech input is delivered (12) to one or more of the available speech processing services.)

1. A method for processing a speech input (SE), the method having the steps of:

-receiving (10) a speech input (SE) of a user;

-preprocessing (11) the speech input (SE) for at least one of two or more available speech processing services (50 _ 1.., 50_ n), wherein one or more preprocessed speech inputs (SE _ 1.., SE _ n) are generated upon the preprocessing (11) of the speech input (SE) in such a way that signal words (SW _ 1.., SW _ n) are added to the speech input (SE), respectively; and

-delivering (12) the pre-processed speech input (SE _ 1.., SE _ n) to one or more of the available speech processing services (50 _ 1.., 50_ n).

2. The method according to claim 1, wherein when preprocessing (11) the speech input (SE) for a plurality or each of the two or more available speech processing services (50 _ 1.., 50_ n), a preprocessed speech input (SE _ 1.., SE _ n) is generated in that the belonging signal word (SW _ 1.., SW _ n) is added to the speech input (SE), respectively, and wherein the belonging preprocessed speech input (SE _ 1.., SE _ n) is passed (12) to each of the selected speech processing services (50 _ 1.., 50_ n).

3. The method of claim 2, further comprising the steps of:

-receiving (13) AN acknowledgement (AN _ 1.., AN n) of the voice processing service (50 _ 1.., 50_ n);

-analyzing (14) the received acknowledgement (AN _ 1.., AN _ n); and

-outputting (15) at least one of the responses (AN _ 1...., AN _ n) retained after said analyzing.

4. A method according to claim 3, wherein the user is asked if two or more responses (AN _ 1.., AN n) are classified as legitimate when analyzing (14) the received responses (AN _ 1.., AN n).

5. The method according to claim 3 or 4, wherein a response (AN _ 1.., AN _ n) is suppressed at the analysis (14) indicating that the preprocessed voice input (SE _ 1.., SE _ n) cannot be successfully processed by one of the available voice processing services (50 _ 1.., 50_ n).

6. The method according to claim 1, wherein the preprocessing (11) of the speech input (SE) comprises:

-analyzing the speech input (SE) with respect to content aspects of the speech input;

-assigning the speech input (SE) to one of the available speech processing services (50 _ 1.., 50_ n); and

-generating a pre-processed speech input (SE _ 1...., SE _ n) by adding a signal word (SW _ 1...., SW _ n) belonging to the assigned speech processing service (50 _ 1...., 50_ n) to the speech input (SE).

7. Method according to claim 6, wherein the Keywords (KW) identified in the speech input (SE) are compared when analyzing the speech input (SE) with a database (23) in which an assignment between the relevant Keywords (KW) and speech processing services (50 _ 1.., 50_ n) is stored.

8. Method according to claim 6 or 7, wherein the user is provided with the possibility to amend the assignment of the speech input (SE) to one of the available speech processing services (50 _ 1.., 50_ n).

9. Method according to one of the preceding claims, wherein in the preprocessing (11) of the speech input (SE) a signal word (SW _ 1.., SW _ n) possibly present in the speech input (SE) and belonging to one of the available speech processing services (50 _ 1.., 50_ n) is first removed.

10. The method according to claim 1, wherein, in the preprocessing (11) of the speech input (SE), a signal word (SW _ 1.., SW _ n) present in the speech input (SE) and belonging to one of the available speech processing services (50 _ 1...., 50_ n) is detected and the speech input (SE) is assigned to the corresponding speech processing service (50 _ 1.., 50_ n).

11. Method according to one of the preceding claims, wherein the speech input (SE) is re-synthesized upon preprocessing (11) of the speech input (SE).

12. A computer-readable storage medium having instructions which, when executed by a computer, cause the computer to carry out the steps of the method according to one of claims 1 to 11 for processing a speech input (SE).

13. An apparatus (20) for processing a speech input (SE), the apparatus having:

-an input (21) which is set up to receive (10) a speech input (SE);

-a pre-processing module (22) for pre-processing (11) the speech input (SE) for at least one of two or more available speech processing services (50 _ 1.., 50_ n), wherein one or more pre-processed speech inputs (SE _ 1..,., SE _ n) are generated upon the pre-processing (11) of the speech input (SE) in such a way that signal words (SW _ 1., SW _ n) are added to the speech input (SE), respectively; and

-an interface (27) set up to deliver the pre-processed speech input (SE _ 1.., SE _ n) to one or more of the available speech processing services (50 _ 1.., 50_ n).

14. Motor vehicle (40) with speech processing, characterized in that the motor vehicle (40) has a device (20) according to claim 13 or is set up to carry out a method according to one of claims 1 to 11 for processing speech inputs (SE).

15. A user terminal device with speech processing, characterized in that the user terminal device has a device (20) according to claim 13 or is set up to carry out a method according to one of claims 1 to 11 for processing speech input (SE).

Technical Field

The present invention relates to a method, apparatus, and computer-readable storage medium having instructions for processing speech input. The invention also relates to a motor vehicle and a subscriber terminal with speech processing, in which the method according to the invention or the device according to the invention is used.

Background

With a speech processing system, a user can carry out a partly automated or fully automated dialog with largely natural speech through a speech interface. Such speech processing systems are known, for example, from the field of telephone services. In such applications, the entire speech processing is performed by a computer system on the provider side of the figure.

Another field of application of speech processing systems is "Smart-Home" devices, that is to say devices for Smart homes. Under the generic concept Smart Home the technical method and system with which a higher living and living quality, an improved safety and a more efficient energy utilization should be achieved is summarized. The basis is networked and remotely controllable devices and automatable processes. Such devices allow, in part, voice-based interaction with intelligent personal assistants. Since high-quality speech processing requires high computing power, it is predominantly carried out by a computer system on the provider side of the intelligent personal assistant in the case of such a device. Limited speech recognition is performed by the device on the user side only for the purpose of activating speech processing.

Furthermore, device-integrated voice processing systems are also increasingly being used, for example in the case of navigation systems in motor vehicles which can be controlled by means of voice inputs or in the case of hands-free devices in motor vehicles by means of which the functions of the motor vehicle can be monitored. Most of these systems work locally.

Against this background, DE 102014017384 a1 describes a method for operating a motor vehicle operating device in order to determine at least one recognition result for a speech input of a user by means of a speech recognition system and to output the recognition results in the form of a result list. If the user then makes a second speech input, it is checked whether the user repeats or modifies his first speech input in the content because he does not find the desired recognition result in the result list.

DE 102014201676 a1 describes a method for controlling a speech dialog of a speech system. First, a first utterance of a user of a speech system is received. Then, based on the first expression, a first list of possible outcomes is determined. The elements of the first list are then analyzed to determine ambiguities of the elements. Finally, based on the partial orthography and the ambiguity, a speech request is generated for the user so that the user can resolve the ambiguity.

For use in motor vehicles, it is precisely desirable to increase the reliability of the speech processing. The driver should be distracted from driving as much as possible without this, i.e. the driver must use a part of his attention for interaction with the speech processing because the speech input is not understood.

A first approach for improving the reliability of speech processing is based on the consideration of context information.

For example, DE 102015213722 a1 describes a method for operating a speech recognition system in a vehicle. In detecting a speech input of a user, data regarding a context of the speech input is additionally detected. Thus, first a recognized text of the speech input is generated and a semantic analysis of the recognized text is performed. Based on the semantic analysis and data regarding the context of the speech input, a recognition quality of the recognized text is determined and one speech model is selected from a plurality of speech models. The speech model is used for further text recognition and for further semantic analysis. This scheme is repeated iteratively until sufficient recognition quality is achieved. Then, the function is executed according to the last semantic analysis.

Another solution for improving the reliability of speech processing uses speech processing by an external computer system in addition to local speech processing.

For example, EP 2909833B 1 describes a method for speech recognition in a motor vehicle. Speech inputs are received from a user and at least one of the speech inputs is communicated to an onboard speech recognition system inside the vehicle. The speech recognition system then generates a first recognition result. The processor unit also communicates the speech input, in whole or in part, to an off-board speech recognition system external to the vehicle, which transmits the second recognition result to the processor unit. Context information may be taken into account when speech recognition. Next, the speech text is determined by the analysis unit on the basis of the first and second recognition results.

US 2015/0058018 a1 describes a method for recognizing a speech input comprising natural speech and at least one word from a domain-specific vocabulary. During a first speech processing, a first portion and a second portion of the speech input are identified, the first portion including natural speech and the second portion including at least one domain-specific word. Further, the natural speech contained in the first portion is processed in this round. In a second speech processing pass, a second portion having the at least one domain-specific word is processed.

Manufacturers of user terminal devices, such as smart phones, tablets, laptops or PCs, have used their own speech processing systems since some time. Examples for this are Apple Siri [1], Microsoft Cortana [2], or Google Allo [3 ]. These systems learn user behavior individually and optimize their responses through constant use. Extensions such as Amazon Echo [4, 5] may control smart home solutions through voice. In part, smart phones with speech processing systems have been incorporated into automobiles.

For example, DE 102014209992 a1 describes a vehicle interface module that can communicate with a user's mobile device and with a vehicle. To this end, the vehicle interface module includes a wireless transceiver for communicating with the mobile device and a vehicle transceiver for communicating with a vehicle data bus. The processor of the vehicle interface module receives signals from the vehicle data bus via the vehicle transceiver, the signals being initiated by user input into the vehicle computing system. The processor determines whether a signal for activating a voice recognition session is requested on the mobile device. If this is the case, a request is made to the mobile device by means of the wireless transceiver to start a speech recognition session.

DE 102012218938 a1 describes a method for identifying and triggering services for a voice-based interface of a mobile device. The method includes receiving a speech recognition result, the speech recognition result being content of a speech input in the mobile device. The desired service is determined by processing the speech recognition result using a service identification grammar. A user service request is determined by processing a portion of the speech recognition results using a service-specific grammar. The user service request is released and a service reply is received. An audio message is generated from the service reply. The audio message is presented to the user through a speaker.

For the future, it would be desirable to expand the voice services integrated in motor vehicles, with enhanced use of the voice processing in the backend. For this purpose, the vehicle manufacturer estimates that its own back-end system will be set up or provided.

Current speech processing systems can be activated by a user in different ways, wherein these speech processing systems can also provide a plurality of possibilities for activation in parallel.

The first scheme is as follows: the user must manipulate the keys in order to enable voice input. After actuation of the button, the system is first acoustically fed back, for example in the form of a signal tone or by a speech output. Next, the user may express voice instructions that are detected and processed by the system.

In a second aspect, the speech processing system is activated by: the user speaks a signal word which is detected and analyzed by the system. The signal word is not necessarily a single word but may be a sequence of words. After successful recognition of the signal word, the system usually first performs an acoustic feedback. As already in the case of the first variant, a tone or speech output can be used for this purpose. As before, the user may then express voice instructions that are detected and processed by the system. Since the speech processing system wakes Up from the sleep state by speaking the signal word, the term "Wake-Up-Phrase" or "Wake-Phrase" was also initiated as an alternative name for the signal word.

According to a third variant, the user speaks the signal word immediately after the speech input or the speech command in a single sentence. In this case, no acoustic feedback of the system is performed immediately after the recognition of the signal word.

If consideration is now given to the case in which, in addition to the vehicle manufacturer's own voice processing, also voice processing by other providers is provided in the vehicle and the incorporation of the mobile subscriber terminal is also provided together with its voice processing, the question arises as to how different services can be specified by the user. One scheme is as follows: different speech processing systems are specified by specific keys or signal words. Then, pressing a key on the multifunction steering wheel starts voice processing of a smartphone, for example, and specifies the voice processing of the vehicle manufacturer with the signal word "hello mass car", in which case voice recognition is performed in the vehicle or also partially or completely in an external system. Instead, the signal word "hello computer" is used to specify the speech processing of other providers.

The disadvantages of this solution are: the user must know which functionality he wants in order to decide which speech assistant he must specify. The user must also know how to specify the corresponding voice assistant.

Disclosure of Invention

The object of the invention is to provide an improved solution for processing speech inputs.

This object is achieved by a method having the features of claim 1, by a computer-readable storage medium having instructions according to claim 12 and by a device having the features of claim 13. Preferred embodiments of the invention are the subject matter of the dependent claims.

According to a first aspect of the invention, a method for processing a speech input comprises the steps of:

-receiving a voice input of a user;

-preprocessing the speech input for at least one of two or more available speech processing services, wherein one or more preprocessed speech inputs are generated upon preprocessing the speech input by adding signal words to the speech input, respectively; and also

-passing the pre-processed speech input to one or more of the available speech processing services.

According to another aspect of the invention, a computer-readable storage medium contains instructions which, when executed by a computer, cause the computer to perform the steps of:

-receiving a voice input of a user;

-passing the pre-processed speech input to one or more of the available speech processing services.

The term computer is to be understood broadly herein. In particular, the computer also comprises a control device and other processor-based data processing devices.

According to another aspect of the present invention, an apparatus for processing a speech input has:

-an input configured to receive a speech input;

a preprocessing module which is set up to preprocess the speech input for at least one of two or more available speech processing services, wherein one or more preprocessed speech inputs are generated when preprocessing the speech input, by adding signal words to the speech input, respectively; and

an interface which is set up to deliver the preprocessed speech input to one or more of the available speech processing services.

In the solution according to the invention, the voice input of the user is first preprocessed before being passed to at least one of the plurality of available voice processing services. In the context of the pretreatment, it is ensured that: the different speech processing services are properly assigned or contacted to the appropriate speech processing service. Thus, the user can simply speak and does not have to think about which speech processing service he has to contact and how he can activate the speech processing service.

According to one aspect of the invention, when the speech input is preprocessed for a plurality of or for each of two or more available speech processing services, a preprocessed speech input is generated in that the associated signal word is added to the speech input in each case. The corresponding pre-processed speech input is then passed to each of the selected speech processing services. In this solution, the original speech input is provided with the appropriate signal words for the selected speech processing service and then passed to the corresponding speech processing service. The scheme has the following advantages: only very simple preprocessing is required from a preprocessing point of view, which requires only little computing power.

According to one aspect of the invention, the method according to the invention comprises as further steps:

-receiving a reply of a voice processing service;

-analyzing the received response; and also

-outputting at least one of the responses retained after the analysis.

After the pre-processed speech input has been passed to and processed by the selected speech processing service, analysis of the received answer is performed by an answer filter. The answer filter delivers meaningful or reasonable answers, that is, answers with the highest hit probability, to the user. In this case, the intelligence consists in evaluating the different responses of the external speech processing services by means of the response filter. The advantage of filtering these responses is that: the user does not have to be concerned with meaningless or impossible responses, thereby increasing user acceptance of the scheme.

According to one aspect of the invention, a user is queried if two or more responses are classified as legitimate when analyzing the received responses. It may happen that: a number of reasonable acknowledgements are received. What makes sense here is: the following queries are made to the user: which answer corresponds best to the answer desired by the user for the speech input on which it is based. In this way, the system can learn from semantically similar associations and better evaluate future responses.

In accordance with one aspect of the invention, responses are suppressed in the analysis that indicate that the preprocessed voice input cannot be successfully processed by one of the designated voice processing services. Typically, if the voice input cannot be processed, the response of the voice processing service follows certain rules. For example, the response may begin with "i did not understand … …". Such responses can therefore be filtered rather simply, so that they are not at all subject to a check as to their plausibility. In this way, the computational power required for analyzing the received responses may be reduced.

According to one aspect of the invention, the pre-processing of the speech input comprises the steps of:

-analyzing the speech input with respect to its content;

-assigning the speech input to one of the available speech processing services; and also

-generating a preprocessed speech input by adding signal words belonging to the assigned speech processing service to the speech input.

In this solution, the speech input is processed as follows: first a semantic recognition is performed and for example the subject of the speech input is determined. Then, an appropriate speech processing service is determined based on the topic. Subsequently, signal words required for the speech processing service are added to the speech input and the speech input thus preprocessed is passed to the speech processing service. Although this approach requires more intelligent and therefore computationally intensive pre-processing, the advantages are: in return, the user gets only one response. Thus, no further analysis of the received response is required.

According to one aspect of the invention, when analyzing a speech input, keywords identified in the speech input are compared to a database in which assignments between relevant keywords and speech processing services are stored. By using the keyword database, the speech input can be assigned to the speech processing service in a simple manner. For example, the keyword "purchase" may be associated with a first voice processing service, the keyword "weather" may be associated with a second voice processing service and the keyword "warmer" may be associated with a third voice processing service, such as the vehicle's own voice processing that adjusts air conditioning equipment based on the voice input.

According to an aspect of the invention, the user is provided with the possibility to amend the assignment of the speech input to one of the available speech processing services. In the case of a contextual analysis of speech input, an erroneous decision may be made. It is thus significant: the user can intervene in a modified manner. Based on the modifications made, the decision basis for the allocation can be dynamically adapted so that the same request is correctly allocated the next time. In this way, the system is learnable.

According to one aspect of the invention, in preprocessing a speech input, signal words that may be present in the speech input and belong to one of the available speech processing services are first removed. It may happen that: users habitually use signal words that do not match a particular speech input. In order that the speech input can still be processed meaningfully, it is helpful: in the context of preprocessing, such signal words are first removed.

According to one aspect of the invention, in preprocessing a speech input, signal words present in the speech input and belonging to one of the available speech processing services are detected. The speech input is then assigned to a corresponding speech processing service. The starting points in this solution are: in which speech input the signal words spoken by the user are correct. Based on this, the speech input can be passed to a corresponding speech processing service without further processing.

In accordance with one aspect of the present invention, speech input is re-synthesized as it is pre-processed. For example, redundant filler words may be removed or speech inputs may be re-written such that they are more reliably recognized by the corresponding speech processing service. Thus, the speech input "i cool" may be passed to the vehicle's own speech processing, for example, as "you are a popular car, please increase the temperature in my vehicle". Of course, the speech input can be resynthesized here differently for different speech processing services.

The method according to the invention or the device according to the invention is particularly advantageously used in a vehicle, in particular a motor vehicle. Furthermore, the method according to the invention or the device according to the invention can also be used in user terminals, for example in smart phones, smart home devices, PCs and laptops, etc.

Drawings

Other features of the invention will be apparent from the subsequent description and the appended claims, taken in conjunction with the accompanying drawings.

FIG. 1 schematically illustrates a method for processing speech input;

FIG. 2 schematically illustrates the processing of a received reply;

FIG. 3 shows a first embodiment of a device for processing speech input;

FIG. 4 shows a second embodiment of a device for processing speech input;

fig. 5 schematically shows a motor vehicle in which the solution according to the invention is implemented;

FIG. 6 schematically shows a system design of a first variant of the solution for processing speech input according to the invention;

FIG. 7 schematically shows a system design of a second variant of the solution for processing speech input according to the invention; while

FIG. 8 illustrates some examples of speech input by a user and the associated preprocessed speech input.

Detailed Description

For a better understanding of the principles of the invention, embodiments thereof are explained in more detail below with reference to the accompanying drawings. It is easy to understand that: the invention is not limited to these embodiments and the described features may also be combined or modified without departing from the scope of protection of the invention as defined in the appended claims.

Fig. 1 schematically shows a method for processing a speech input. In a first step, a speech input of a user is received 10. The speech input is then preprocessed 11 for at least one of the two or more available speech processing services. Finally, the pre-processed speech input is delivered 12 to one or more of these available speech processing services.

In a first variant, when the speech input is preprocessed for a plurality of or for each of two or more available speech processing services, a preprocessed speech input can be generated in that the associated signal word is added to the speech input in each case. The associated preprocessed voice input is then passed to each of the selected voice processing services. Subsequently, the answers of these speech processing services are received 13 and analyzed 14. Finally, at least one of the responses remaining after the analysis is output 15. This is schematically shown in fig. 2. The user may be queried if two or more responses are classified as legitimate in analyzing the received responses. In addition, responses that indicate that the preprocessed speech input cannot be successfully processed by one of the available speech processing services can be suppressed during the analysis.

In a second variant, the speech inputs can be analyzed with respect to their content, for example by comparison of the keywords recognized in the speech inputs with a database in which the assignment between the relevant keywords and the speech processing service is stored. Based on the results of the analysis, the speech input is assigned to one of the available speech processing services. Finally, a preprocessed speech input is generated by adding signal words belonging to the assigned speech processing service to the speech input. In this case, the user can be provided with the possibility of modifying the assignment of the speech input to one of the available speech processing services.

In a third variant, a signal word present in the speech input and belonging to one of the available speech processing services can be detected. The speech input is then assigned to a corresponding speech processing service.

Preferably, the user can specify which type of pre-processing is used or influence the characteristics of the pre-processing.

In the first two variants, signal words which may be present in the speech input and which belong to one of the available speech processing services can first be removed if necessary. In all variants, the speech input may be re-synthesized for delivery to the speech processing services.

Provision may also be made for: the user must first activate the speech processing by appropriate measures, for example by pressing a key or forwarding to a speech processing service for an explicit confirmation. In this way, it can be excluded that e.g. a conversation between passengers of a vehicle unintentionally triggers an action by these speech processing services.

Fig. 3 shows a simplified schematic diagram of a first embodiment of a device 20 for processing speech input. The apparatus 20 has: an input 21 through which a user's voice input may be received, for example from a microphone or other audio source; and a memory 26 in which received speech input may be stored. The pre-processing module 22 pre-processes the speech input for at least one of two or more available speech processing services 50_ 1. The preprocessing of the speech input can take place as described above in connection with fig. 1. Thus, the device 20 may have a database 24 in which the assignments between the relevant key words and the speech processing services 50_ 1. The pre-processed speech input is passed to one or more of the available speech processing services 50_ 1.., 50_ n through the interface 27. Responses of the speech processing services 50_ 1., 50_ n are also received via the interface 27, which responses can be analyzed by the response filter 23. Analysis of these responses may be performed as set forth above in connection with fig. 2.

The pre-processing module 22, the answer filter 23 and the database 24 may be controlled by a monitoring unit 25. Via the user interface 28, the settings of the preprocessing module 22, the response filter 23 or the monitoring unit 25 can be changed if necessary or a query can be made to the user and responded to by the user. The contents of the database 24 may also be processed through the user interface 28. The data accumulated in the device 20 may be stored in the memory 26 as needed, for example in the memory 26 for later analysis or for use by components of the device 20. The pre-processing module 22, the answer filter 23 and the monitoring unit 25 may be implemented as dedicated hardware, for example as an integrated circuit. However, they may of course also be combined in part or in whole or be implemented as software running on a suitable processor, e.g. on a CPU or GPU. The input 21 and the interface 27 may be implemented as separate interfaces or may be implemented as a combined bi-directional interface.

Fig. 4 shows a simplified schematic diagram of a second embodiment of a device 30 for processing speech input. The device 30 has a processor 32 and a memory 31. The device 30 is, for example, a computer or a control device. In the memory 31 there are stored instructions which, when executed by the processor 32, cause the device 30 to carry out the steps according to one of the described methods. The instructions stored in the memory 31 are therefore embodied as a program which can be executed by the processor 32, said program implementing the method according to the invention. The device 30 has an input 33 for receiving audio data, for example from a microphone or other audio source. Data generated by the processor 32 is provided via an output 34. These data may also be stored in the memory 31. The input 33 and output 34 may be combined into a bidirectional interface.

The processor 32 may include one or more processor units, such as a microprocessor, a digital signal processor, or a combination thereof.

The memories 26, 31 of the described embodiments may have not only volatile storage areas but also nonvolatile storage areas, but may include various storage devices and storage media such as hard disks, optical storage media, or semiconductor memories.

Fig. 5 schematically shows a motor vehicle 40 in which the solution according to the invention is implemented. The motor vehicle 40 has an operating device 41, for example an infotainment system with a touch screen and voice operating possibilities. To detect voice input, a microphone 42 is disposed in the motor vehicle 40.

The motor vehicle 40 also has a device 20 for processing speech input. The device 20 may also be integrated into the operating device 41. Further components of the motor vehicle 40 are an automatic climate control device 43 and a navigation system 44, which can be operated by a user, in particular by voice input. By means of the data transmission unit 45, a connection to a provider of external voice processing services can be established if required, for example via a mobile radio network. To store data, a memory 46 is present. Data exchange between the various components of the motor vehicle 40 takes place via the network 47. A response to the user's voice input may be output through speaker 48.

The way in which the solution according to the invention works shall be explained in more detail later on with reference to fig. 6 to 8, using as an example in a motor vehicle.

Fig. 6 schematically shows a system design of a first variant of the solution according to the invention for processing a speech input SE. The device 20 for processing speech inputs detects a speech input SE made by a user by means of a microphone 42 arranged in the motor vehicle 40. The pre-processing module 22 of the device 20 pre-processes the speech input SE for a series of speech processing services 50_ 1. Here, the speech input SE can be synthesized again if necessary. In this example, these speech processing services 50_1, 50_ n are in particular a service 50_1 of the manufacturer of the motor vehicle 40, a smart home solution 50_2 and a shopping application 50_ 3. A generic type service is shown as the last speech processing service 50_ n. Here, the manufacturer's service 50_1 reacts to the signal word "hello popular car", the smart home solution 50_2 reacts to the signal word "hey heya" as a smart personal assistant, the shopping application 50_3 reacts to the signal word "computer", and the generic service 50_ n reacts to the signal word "hello xyz". The resulting preprocessed speech input SE _ 1.., SE _ n is transmitted by means of the data transmission unit 45 of the motor vehicle 40 via the data network 60 to the desired speech processing service 50_ 1.., 50_ n. The answers AN _ 1.. and AN n of the voice processing services 50_ 1.. and 50_ n are received via the data network 60 and the data transmission unit 45 and passed to the answer filter 23 of the device 20. The response filter analyzes the received responses AN _ 1.,. ann, AN _ n and outputs at least one of the responses remaining after the analysis as a voice output SA to the user via the loudspeaker 48 of the motor vehicle 40. Here, the answer filter 23 preferably only delivers meaningful answers for the speech processing service 50_ 1. For example, "i am cold" as the initial speech input. "the resulting smart home solution 50_2 and the answer of the shopping application 50_ 3" i do not understand your meaning. "is intercepted by the reply filter 23. Instead, the response of service 50_1 of the manufacturer of motor vehicle 40 "i have adjusted the temperature in the vehicle up by two degrees. "is transmitted by the reply filter 23.

Fig. 7 schematically shows a system design of a second variant of the solution according to the invention for processing a speech input SE. The system design largely corresponds to the system design in fig. 6, but another solution has been implemented for the preprocessing of the speech input SE. The preprocessing module 22 analyzes the speech input SE with respect to its content. For this purpose, the preprocessing module 22 or a module provided in addition to this purpose compares the keywords recognized in the speech input SE with a database 24 in which the assignment between the relevant keyword and the speech processing service 50_ 1. Based on the result of the analysis, the speech input SE is assigned to one of the speech processing services 50_1,.., 50_ n, in this case the service 50_1 of the manufacturer of the motor vehicle 40. Finally, a preprocessed speech input SE _1 is generated by adding the corresponding signal word to the speech input SE. Here, the speech input SE can also be synthesized again. The preprocessed speech input SE _1 is transmitted as usual by means of a data transmission unit via the data network 60 to the assigned speech processing service 50_ 1. Finally, the response AN _1 of the voice processing service 50_1 is received via the data network 60 and the data processing unit 45 and is output as a voice output SA to the user via the loudspeaker 48. For example, the initial speech input "I am cold. "you can" get your public car, please raise the temperature in my vehicle! The form of "is forwarded to the service 50_1 of the manufacturer of the motor vehicle 40. The user then gets a response "i have adjusted the temperature in the vehicle up by two degrees. Correspondingly, the initial voice input "warm air on at home! "to" he Piya at home and turn on the heater! The form of "is forwarded to the smart home solution 50_ 2. The user then gets the answer, "i have turned warm on, for example. "

Fig. 8 shows some examples of a speech input SE made by a user and the associated preprocessed speech input.

In example a), the speech input SE comprises only the speech command SB, in this case a request "warm air on at home! ". According to the keyword KW 'at home', the method can be deduced: the voice input is presented to the smart home solution used by the user. Because the smart home solution uses the signal word SW _2 "hepigia," the speech input SE is supplemented with the signal word SW _2 prior to being passed to the smart home solution. Thus, the preprocessed speech input SE _2 is "hepigia, open heating at home! ".

In example b), in addition to the known voice instruction SB "warm air at home", the voice input SE also includes a signal word SW _1 in the expression "hello mass car" which belongs to a voice processing service that does not match the content of the voice instruction SB. In the context of preprocessing, the signal word SW _1 is removed and replaced by the appropriate signal word SW _2, "heya", so that the preprocessed voice input SE _2 is again "heya", opening the heating in the home! ".

In example c), the speech input SE comprises only the speech instruction SB "we also need water". According to the keywords KW "need" and "water" it can be deduced: the user, for example, wants to remember something for shopping, for which the user uses an application that uses the signal word SW _ n "hello xyz". The system also knows that the user's "water" refers to a tank of mineral water, based on the user's feedback so far. Thus, the preprocessed speech input SE _ n generated by the system is "your xyz, we need a tank of spring water. ".

Reference to the literature

[1]https://de.wikipedia.org/wiki/Siri_(Software)

[2]https://de.wikipedia.org/wiki/Cortana_(Software)

[3]https://de.wikipedia.org/wiki/Google_Allo

[4]https://de.wikipedia.org/wiki/Amazon_Echo

[5]https://reverb.ai/

List of reference numerals

10 receiving a speech input

11 preprocessing speech input

12 passing preprocessed speech input

13 receive acknowledgement

Analyzing 14 the received response

15 outputting at least one reserved response

20 device

21 input terminal

22 preprocessing module

23 answer filter

24 database

25 monitoring unit

26 memory

27 interface

28 user interface

30 device

31 memory

32 processor

33 input terminal

34 output terminal

40 motor vehicle

41 operating device

42 microphone

43 automatic control device for air conditioner

44 navigation system

45 data transmission unit

46 memory

47 network

48 loudspeaker

50_ 1.., 50_ n voice processing service

60 data network

AN _1, a

KW keyword

SA Speech output

SB voice instruction

SE Speech input

SE _ 1.. SE _ n preprocessed speech input

SW _ 1., SW _ n signal words.

19页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：位线驱动的读出放大器时钟方案

Method, device and computer-readable storage medium with instructions for processing a speech input, motor vehicle with speech processing and user terminal

相关技术

网友询问留言