Speech recognition apparatus and speech recognition method

文档序号:1602698 发布日期:2020-01-07 浏览:26次 中文

阅读说明:本技术 语音识别装置及语音识别方法 (Speech recognition apparatus and speech recognition method ) 是由 武井匠 竹里尚嘉 于 2017-05-25 设计创作,主要内容包括:本发明的语音识别装置包括:进行说话者语音的语音识别的语音识别部(101);从语音识别结果中提取出预先设定的关键字的关键字提取部(103);参照关键字的提取结果,对说话者语音是否是对话进行判定的对话判定部(105);以及在判定为不是对话的情况下,从语音识别结果中提取出用于操作设备的命令,并在判定为是对话的情况下,不从语音识别结果中提取命令的操作命令提取部(106)。(The speech recognition apparatus of the present invention includes: a voice recognition unit (101) that performs voice recognition of a speaker voice; a keyword extraction unit (103) that extracts a preset keyword from the voice recognition result; a conversation judging unit (105) for judging whether or not the speaker voice is a conversation by referring to the extraction result of the keyword; and an operation command extraction unit (106) that extracts a command for operating the device from the voice recognition result when it is determined that the device is not a conversation, and does not extract a command from the voice recognition result when it is determined that the device is a conversation.)

1. A speech recognition apparatus, comprising:

a voice recognition unit that performs voice recognition of a speaker voice;

a keyword extraction unit that extracts a preset keyword from a recognition result of the voice recognition unit;

a dialogue determination unit that determines whether or not the speaker voice is a dialogue by referring to the extraction result of the keyword extraction unit; and

an operation command extracting unit that extracts a command for operating the device from the recognition result of the voice recognition unit when the dialogue determining unit determines that the dialogue is not the dialogue, and does not extract the command from the recognition result when the dialogue determining unit determines that the dialogue is the dialogue.

2. The speech recognition apparatus of claim 1,

the preset keywords are names of people or languages for representing calling.

3. The speech recognition apparatus of claim 1, comprising:

a face direction information acquisition unit that acquires face direction information of at least either one of a speaker and a person other than the speaker; and

a face direction determination unit that, when the conversation determination unit determines that the conversation is not a conversation, further determines whether the speaker voice is a conversation based on whether the face direction information acquired by the face direction information acquisition unit satisfies a preset condition,

the operation command extracting unit extracts the command from the recognition result when the face direction determining unit determines that the face direction determining unit does not determine that the face direction determining unit determines that the face.

4. The speech recognition apparatus of claim 1, comprising:

a face direction information acquisition unit that acquires face direction information of a person other than the speaker; and

and a reaction detection unit that detects presence or absence of a reaction of the other person based on at least one of the face direction information of the other person with respect to the speaker voice of the speaker acquired by the face direction information acquisition unit or the voice response of the other person with respect to the speaker voice of the speaker recognized by the voice recognition unit, and sets the speaker voice or a part of the speaker voice as the keyword when the reaction of the other person is detected.

5. The speech recognition apparatus of claim 1,

the conversation determination unit determines whether or not the interval of the speech section of the recognition result of the speech recognition unit is equal to or greater than a preset threshold while it is determined that the speaker speech is a conversation, and estimates that the conversation has ended when the interval of the speech section is equal to or greater than the preset threshold.

6. The speech recognition apparatus of claim 1,

the conversation determination unit determines whether or not a language indicating the end of a conversation is included in the recognition result of the speech recognition unit while it is determined that the speaker speech is a conversation, and estimates that the conversation has ended when the language indicating the end of the conversation is included.

7. The speech recognition apparatus of claim 1,

the conversation determination unit performs control to notify the determination result when determining that the speaker voice is a conversation.

8. A speech recognition method, comprising the steps of:

a step in which a voice recognition unit performs voice recognition of a speaker voice;

a step in which a keyword extraction unit extracts a preset keyword from the recognition result of the voice recognition;

a dialogue determination unit that determines whether or not the speaker voice is a dialogue by referring to an extraction result of the keyword extraction; and

and a step in which the operation command extracting unit extracts a command for operating the device from the recognition result when it is determined that the device is not a dialogue, and does not extract the command from the recognition result when it is determined that the device is a dialogue.

Technical Field

The present invention relates to a technique of performing speech recognition on a speaker's speech and extracting information for controlling an apparatus.

Background

Conventionally, there has been used a technique for reducing the occurrence of erroneous recognition when determining whether a voice of a speaker is a voice for instructing device control or a voice of a conversation between speakers when there are voices of a plurality of speakers.

For example, patent document 1 discloses a speech recognition device that determines that a speaker speech is a speaker speech constituting a conversation when speaker speech of a plurality of speakers is detected within a predetermined time period in the past, and does not perform a predetermined keyword detection process.

Disclosure of Invention

Technical problem to be solved by the invention

According to the speech recognition device described in patent document 1, a plurality of speech acquisition units are used to detect a speaker's speech of a certain speaker, and after the speaker's speech is detected, whether or not a speech of another speaker is acquired within a predetermined time is detected, thereby detecting a conversation between speakers. Therefore, there is a problem in that a plurality of sound collection units are required. Further, it is necessary to wait for a predetermined time to detect a conversation of a speaker, and a predetermined keyword detection process is delayed, which causes a problem of deterioration in operability.

The present invention has been made to solve the above-described problems, and an object thereof is to suppress erroneous recognition of a speaker voice without requiring a plurality of sound collection units and to perform extraction of an operation command for operating a device without setting a delay time.

Technical scheme for solving technical problem

The speech recognition device according to the present invention includes: a voice recognition unit that performs voice recognition of a speaker voice; a keyword extraction unit that extracts a preset keyword from the recognition result of the voice recognition unit; a dialogue determination unit that determines whether or not the speaker voice is a dialogue by referring to the extraction result of the keyword extraction unit; and an operation command extracting unit that extracts a command for operating the device from the recognition result of the voice recognition unit when the dialogue determining unit determines that the dialogue is not the dialogue, and does not extract the command from the recognition result when the dialogue determining unit determines that the dialogue is the dialogue.

Effects of the invention

According to the present invention, it is possible to suppress erroneous recognition of a speaker voice based on the speaker voice collected by a single voice collecting unit. Further, extraction of an operation command for operating the apparatus can be performed without setting a delay time.

Drawings

Fig. 1 is a block diagram showing the configuration of a speech recognition apparatus according to embodiment 1.

Fig. 2A and 2B are diagrams showing an example of the hardware configuration of the voice recognition apparatus.

Fig. 3 is a flowchart showing an operation of a speech recognition process in the speech recognition apparatus according to embodiment 1.

Fig. 4 is a flowchart showing an operation of a dialogue determination process in the speech recognition apparatus according to embodiment 1.

Fig. 5 is a diagram showing another configuration of the speech recognition apparatus according to embodiment 1.

Fig. 6 is a diagram showing an example of display of a display screen of a display device connected to the voice recognition device according to embodiment 1.

Fig. 7 is a block diagram showing the configuration of the speech recognition apparatus according to embodiment 2.

Fig. 8 is a flowchart showing an operation of a dialogue determination process in the speech recognition apparatus according to embodiment 2.

Fig. 9 is a block diagram showing the configuration of the speech recognition apparatus according to embodiment 3.

Fig. 10 is a flowchart showing an operation of a keyword registration process in the speech recognition apparatus according to embodiment 3.

Fig. 11 is a block diagram showing an example of a case where the configuration according to embodiment 1 is cooperatively assumed by the voice recognition apparatus and the server apparatus.

Detailed Description

Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings in order to explain the present invention in more detail.

24页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于语音纠正使用自动语音识别生成的输入的方法和系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!