Information processing method and device and electronic equipment

文档序号:1087409 发布日期:2020-10-20 浏览:8次 中文

阅读说明:本技术 信息处理方法、装置及电子设备 (Information processing method and device and electronic equipment ) 是由 韩伟 于 2019-04-04 设计创作,主要内容包括:本发明实施例提供一种信息处理方法、装置及电子设备,获取输入智能设备的语音信息;对语音信息进行语音识别处理,得到语音信息对应的文本信息,文本信息包括至少一个词汇以及各词汇的时间信息;根据词汇的时间信息,对文本信息进行划分,得到至少一个文本片段;根据至少一个文本片段的语义识别结果,获取语音信息的有效语义信息;本实施例对语音信息不进行切分而直接识别为文本信息,在根据各词汇的时间信息对文本信息进行切分的过程考虑了自然语言理解,使得对文本信息的切分结果更加准确,进而根据切分后的文本片段的语义识别结果确定语音信息的有效语义信息,能够提高语义识别的准确率。(The embodiment of the invention provides an information processing method, an information processing device and electronic equipment, which are used for acquiring voice information input into intelligent equipment; performing voice recognition processing on the voice information to obtain text information corresponding to the voice information, wherein the text information comprises at least one vocabulary and time information of each vocabulary; dividing the text information according to the time information of the vocabulary to obtain at least one text segment; obtaining effective semantic information of the voice information according to a semantic recognition result of at least one text fragment; according to the embodiment, the voice information is directly recognized as the text information without being segmented, natural language understanding is considered in the process of segmenting the text information according to the time information of each vocabulary, the segmentation result of the text information is more accurate, the effective semantic information of the voice information is determined according to the semantic recognition result of the segmented text segment, and the accuracy of semantic recognition can be improved.)

1. An information processing method characterized by comprising:

acquiring voice information input into the intelligent equipment;

performing voice recognition processing on the voice information to obtain text information corresponding to the voice information, wherein the text information comprises at least one vocabulary and time information of each vocabulary, and the time information is used for indicating the time when the intelligent equipment acquires a voice frame corresponding to the vocabulary;

dividing the text information according to the time information of the vocabulary to obtain at least one text segment;

and acquiring effective semantic information of the voice information according to the semantic recognition result of the at least one text fragment.

2. The method of claim 1, wherein the dividing the text information according to the time information of the vocabulary to obtain at least one text segment comprises:

acquiring a time interval between two adjacent words in the text information according to the time information of each word;

if the time interval meets a set condition, determining that a segmentation point is arranged between the two vocabularies;

and segmenting the text information according to the determined segmentation point to obtain the at least one text segment.

3. The method of claim 2, wherein determining that a cut point is set between the two vocabularies if the time interval satisfies a set condition comprises:

and if the time interval is greater than or equal to a target threshold value, determining that a segmentation point is set between the two vocabularies.

4. The method of claim 3, further comprising:

determining the speech rate grade of the voice information according to at least one time interval;

and according to the corresponding relation between the speed level and the time threshold, taking the time threshold corresponding to the speed level of the voice information as the target threshold.

5. The method of claim 3, further comprising:

acquiring a first average value of the time intervals according to the time intervals of the text information, and determining the first average value as the target threshold value; alternatively, the first and second electrodes may be,

sequentially determining a second average value of time intervals of the vocabularies of the first preset number, and determining the second average value as a target threshold value corresponding to the vocabularies of the second preset number behind the vocabularies of the first preset number; alternatively, the first and second electrodes may be,

and aiming at any vocabulary in the text information, acquiring a third average value of time intervals of all vocabularies positioned before the vocabulary in the text information, and taking the third average value as a target threshold value corresponding to the vocabulary.

6. The method according to any one of claims 1 to 5, wherein the semantic recognition result comprises: semantic integrity probability scores and semantic information; the obtaining of the effective semantic information of the voice information according to the semantic recognition result of the at least one text fragment includes:

and taking the semantic information of the text segment with the semantic integrity probability score meeting the preset condition as the effective semantic information of the voice information.

7. The method according to claim 1, wherein after obtaining the valid semantic information of the voice information, further comprising:

acquiring reply information corresponding to the voice information according to the effective semantic information;

and controlling the intelligent equipment to output the reply information.

8. An information processing apparatus characterized by comprising:

the acquisition module is used for acquiring voice information input into the intelligent equipment;

the first recognition module is used for performing voice recognition processing on the voice information to obtain text information corresponding to the voice information, wherein the text information comprises at least one vocabulary and time information of each vocabulary, and the time information is used for indicating the time when the intelligent equipment acquires a voice frame corresponding to the vocabulary;

the segmentation module is used for segmenting the text information according to the time information of the vocabulary to obtain at least one text segment;

and the second identification module is used for acquiring the effective semantic information of the voice information according to the semantic identification result of the at least one text fragment.

9. An electronic device, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of any of claims 1-7.

10. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement the method of any one of claims 1 to 7.

Technical Field

The embodiment of the invention relates to the technical field of artificial intelligence, in particular to an information processing method and device and electronic equipment.

Background

With the development of human-computer interaction technology, the semantic recognition technology shows its importance. Semantic recognition is a process of extracting feature information from a voice signal emitted by a human and determining the language meaning thereof, and mainly includes a voice recognition process and a semantic understanding process. The speech recognition process is a process of converting a human speech signal into text using an acoustic model, and the semantic understanding process is a process of recognizing the meaning of text using a natural language model.

Disclosure of Invention

The embodiment of the invention provides an information processing method, an information processing device and electronic equipment, which are used for improving the accuracy of semantic recognition.

In a first aspect, an embodiment of the present invention provides an information processing method, including:

acquiring voice information input into the intelligent equipment;

performing voice recognition processing on the voice information to obtain text information corresponding to the voice information, wherein the text information comprises at least one vocabulary and time information of each vocabulary, and the time information is used for indicating the time when the intelligent equipment acquires a voice frame corresponding to the vocabulary;

dividing the text information according to the time information of the vocabulary to obtain at least one text segment;

and acquiring effective semantic information of the voice information according to the semantic recognition result of the at least one text fragment.

Optionally, the dividing the text information according to the time information of the vocabulary to obtain at least one text segment includes:

acquiring a time interval between two adjacent words in the text information according to the time information of each word;

if the time interval meets a set condition, determining that a segmentation point is arranged between the two vocabularies;

and segmenting the text information according to the determined segmentation point to obtain the at least one text segment.

Optionally, if the time interval meets a set condition, determining that a segmentation point is set between the two vocabularies includes:

and if the time interval is greater than or equal to a target threshold value, determining that a segmentation point is set between the two vocabularies.

Optionally, the method further includes:

determining the speech rate grade of the voice information according to at least one time interval;

and according to the corresponding relation between the speed level and the time threshold, taking the time threshold corresponding to the speed level of the voice information as the target threshold.

Optionally, the method further includes:

acquiring a first average value of the time intervals according to the time intervals of the text information, and determining the first average value as the target threshold value; alternatively, the first and second electrodes may be,

sequentially determining a second average value of time intervals of the vocabularies of the first preset number, and determining the second average value as a target threshold value corresponding to the vocabularies of the second preset number behind the vocabularies of the first preset number; alternatively, the first and second electrodes may be,

and aiming at any vocabulary in the text information, acquiring a third average value of time intervals of all vocabularies positioned before the vocabulary in the text information, and taking the third average value as a target threshold value corresponding to the vocabulary.

Optionally, the semantic recognition result includes: semantic integrity probability scores and semantic information; the obtaining of the effective semantic information of the voice information according to the semantic recognition result of the at least one text fragment includes:

and taking the semantic information of the text segment with the semantic integrity probability score meeting the preset condition as the effective semantic information of the voice information.

Optionally, the step of taking the semantic information of the text segment with the semantic integrity probability score meeting the preset condition as the effective semantic information of the voice information includes:

for each text fragment in the at least one text fragment, if the semantic integrity probability score of the text fragment is greater than or equal to a preset threshold value, taking the semantic information of the text fragment as effective semantic information of the voice information; alternatively, the first and second electrodes may be,

and aiming at the at least one text fragment, taking the semantic information of the text fragment with the highest semantic integrity probability score as the effective semantic information of the voice information.

Optionally, the step of taking the semantic information of the text segment with the semantic integrity probability score meeting the preset condition as the effective semantic information of the voice information includes:

aiming at any text fragment in the at least one text fragment, obtaining a cached historical text fragment, wherein the historical text fragment is at least one text fragment of which the semantic integrity probability score before the text fragment does not meet the preset condition;

carrying out semantic recognition processing on the historical text fragments and new text fragments obtained by splicing the text fragments to obtain semantic recognition results of the new text fragments;

and if the semantic integrity probability score of the new text fragment is greater than or equal to a preset threshold value, taking the semantic information of the new text fragment as effective semantic information of the voice information.

Optionally, the method further includes:

and if the semantic integrity probability score of the new text fragment is greater than or equal to a preset threshold value, deleting the historical text fragment from the cache.

Optionally, the method further includes:

and if the semantic integrity probability score of the new text fragment is smaller than a preset threshold value, storing the new text fragment as a historical text fragment in a cache.

Optionally, after obtaining the valid semantic information of the voice information, the method further includes:

acquiring reply information corresponding to the voice information according to the effective semantic information;

and controlling the intelligent equipment to output the reply information.

In a second aspect, an embodiment of the present invention provides an information processing apparatus, including:

the acquisition module is used for acquiring voice information input into the intelligent equipment;

the first recognition module is used for performing voice recognition processing on the voice information to obtain text information corresponding to the voice information, wherein the text information comprises at least one vocabulary and time information of each vocabulary, and the time information is used for indicating the time when the intelligent equipment acquires a voice frame corresponding to the vocabulary;

the segmentation module is used for segmenting the text information according to the time information of the vocabulary to obtain at least one text segment;

and the second identification module is used for acquiring the effective semantic information of the voice information according to the semantic identification result of the at least one text fragment.

Optionally, the cutting module is specifically configured to:

acquiring a time interval between two adjacent words in the text information according to the time information of each word;

if the time interval meets a set condition, determining that a segmentation point is arranged between the two vocabularies;

and segmenting the text information according to the determined segmentation point to obtain the at least one text segment.

Optionally, the cutting module is specifically configured to:

and if the time interval is greater than or equal to a target threshold value, determining that a segmentation point is set between the two vocabularies.

Optionally, the splitting module is further configured to:

determining the speech rate grade of the voice information according to at least one time interval;

and according to the corresponding relation between the speed level and the time threshold, taking the time threshold corresponding to the speed level of the voice information as the target threshold.

Optionally, the splitting module is further configured to:

acquiring a first average value of the time intervals according to the time intervals of the text information, and determining the first average value as the target threshold value; alternatively, the first and second electrodes may be,

sequentially determining a second average value of time intervals of the vocabularies of the first preset number, and determining the second average value as a target threshold value corresponding to the vocabularies of the second preset number behind the vocabularies of the first preset number; alternatively, the first and second electrodes may be,

and aiming at any vocabulary in the text information, acquiring a third average value of time intervals of all vocabularies positioned before the vocabulary in the text information, and taking the third average value as a target threshold value corresponding to the vocabulary.

Optionally, the semantic recognition result includes: semantic integrity probability scores and semantic information; the second identification module is specifically configured to:

and taking the semantic information of the text segment with the semantic integrity probability score meeting the preset condition as the effective semantic information of the voice information.

Optionally, the second identification module is specifically configured to:

for each text fragment in the at least one text fragment, if the semantic integrity probability score of the text fragment is greater than or equal to a preset threshold value, taking the semantic information of the text fragment as effective semantic information of the voice information; alternatively, the first and second electrodes may be,

and aiming at the at least one text fragment, taking the semantic information of the text fragment with the highest semantic integrity probability score as the effective semantic information of the voice information.

Optionally, the second identification module is specifically configured to:

aiming at any text fragment in the at least one text fragment, obtaining a cached historical text fragment, wherein the historical text fragment is at least one text fragment of which the semantic integrity probability score before the text fragment does not meet the preset condition;

carrying out semantic recognition processing on the historical text fragments and new text fragments obtained by splicing the text fragments to obtain semantic recognition results of the new text fragments;

and if the semantic integrity probability score of the new text fragment is greater than or equal to a preset threshold value, taking the semantic information of the new text fragment as effective semantic information of the voice information.

Optionally, the second identification module is further configured to: and if the semantic integrity probability score of the new text fragment is greater than or equal to a preset threshold value, deleting the historical text fragment from the cache.

Optionally, the second identification module is further configured to:

and if the semantic integrity probability score of the new text fragment is smaller than a preset threshold value, storing the new text fragment as a historical text fragment in a cache.

Optionally, the second identification module is further configured to:

acquiring reply information corresponding to the voice information according to the effective semantic information;

and controlling the intelligent equipment to output the reply information.

In a third aspect, an embodiment of the present invention provides an electronic device, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of any one of the first aspects.

In a fourth aspect, the present invention provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method according to any one of the first aspect is implemented.

In a fifth aspect, embodiments of the present invention provide a computer program product comprising computer program code which, when run on a computer, causes the computer to perform the method of any of the first aspects above.

In a sixth aspect, an embodiment of the present invention provides a chip, including a memory and a processor, where the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that an electronic device in which the chip is installed performs the method according to any one of the above first aspects.

The technical scheme provided by the embodiment of the invention is as follows: acquiring voice information input into the intelligent equipment; performing voice recognition processing on the voice information to obtain text information corresponding to the voice information, wherein the text information comprises at least one vocabulary and time information of each vocabulary, and the time information is used for indicating the time when the intelligent equipment acquires a voice frame corresponding to the vocabulary; dividing the text information according to the time information of the vocabulary to obtain at least one text segment; and acquiring effective semantic information of the voice information according to the semantic recognition result of the at least one text fragment. Therefore, in the embodiment, the voice information is directly recognized as the text information without being segmented, the text information is segmented according to the time information of each vocabulary in the text information, and then the text fragment is subjected to semantic recognition to obtain the effective semantic information of the voice information. The natural language understanding is considered in the process of segmenting the text information according to the time information of each vocabulary, so that the segmentation result of the text information is more accurate, the effective semantic information of the voice information is determined according to the semantic recognition result of the segmented text segment, and the accuracy of semantic recognition can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of a semantic recognition process in the prior art;

FIG. 2 is a diagram illustrating a semantic identification process provided by an embodiment of the present invention;

fig. 3 is a first flowchart illustrating an information processing method according to an embodiment of the present invention;

fig. 4 is a second flowchart illustrating an information processing method according to an embodiment of the present invention;

FIG. 5 is a second diagram illustrating a semantic recognition process according to an embodiment of the present invention;

fig. 6 is a third schematic flowchart of an information processing method according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

22页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:特征提交重复数据删除引擎

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!