Automatic construction method of multi-source information fusion new word library

文档序号：1544889 发布日期：2020-01-17 浏览：20次中文

阅读说明：本技术 一种多源信息融合的生词库自动构建方法 (Automatic construction method of multi-source information fusion new word library ) 是由李吉平古万荣朱凯于 2019-08-19 设计创作，主要内容包括：本发明公开了一种多源信息融合的生词库自动构建方法,包括生词自动识别和生词库自动更新两个过程；将生词划分为阅读生词和语音生词；将生词库分为短时记忆、长时记忆两个多级生词库；本发明将用户操作、口型、语音、记忆等信息相融合,进行生词的自动识别和生词库的动态更新,实现了生词库构建过程的完全自动化,同以往用户先自行判断再手动确认的方式相比,可以提高学习效率；将生词划分为阅读生词和语音生词,同以往局限于对不认识或不熟悉的词进行记忆的方式相比,可同时对阅读能力和听说水平的提高起到促进作用；将生词库分为短时记忆、长时记忆两个多级生词库,为个性化记忆规律的研究提供了手段。(The invention discloses an automatic construction method of a new word stock of multi-source information fusion, which comprises two processes of automatic new word identification and automatic new word stock updating; dividing the new words into reading new words and voice new words; dividing the word generation library into two multi-level word generation libraries of short-term memory and long-term memory; the invention fuses information such as user operation, mouth shape, voice, memory and the like, performs automatic identification of the new words and dynamic update of the new word library, realizes the full automation of the new word library construction process, and can improve the learning efficiency compared with the prior mode that a user firstly judges by himself and then confirms manually; the new words are divided into reading new words and voice new words, and compared with the prior mode of limiting to memorize unknown or unfamiliar words, the method can simultaneously play a role in promoting the improvement of reading capability and listening and speaking level; the word generation library is divided into two multilevel word generation libraries of short-term memory and long-term memory, and a means is provided for the research of the personalized memory law.)

1. An automatic construction method of a multi-source information fusion word stock is characterized in that the word stock comprises two types of reading word stocks and voice word stocks, and the word stock is divided into two multi-level word stock with short-term memory and long-term memory, and the method comprises the following steps:

s1, automatically recognizing new words;

s11, automatically recognizing and reading new words according to the user operation information;

the user operation information comprises operation reaction time and information whether the operation is correct, and if the user does not perform the operation or performs the operation error within the specified time, the word is identified as a reading new word;

s12, automatically recognizing the speech new word according to the mouth shape and the speech information of the user;

s2, automatically updating the new word stock;

s21, automatically storing the recognized reading new words and the recognized voice new words into a reading new word bank and a voice new word bank respectively;

and S22, automatically and dynamically updating the multilevel thesaurus.

2. The automatic construction method of the multi-source information fusion new word stock according to claim 1, wherein the S12 further comprises the following steps:

s121, if the voice recognition equipment does not receive the voice of the user or receives wrong voice of the user within the specified time, but the mouth shape recognition equipment recognizes that the pronunciation mouth shape of the user is correct, prompting the user to read again;

and S122, if the voice recognition equipment does not receive the voice of the user or receives wrong voice of the user within the specified time, and meanwhile, the mouth shape recognition equipment recognizes that the pronunciation mouth shape of the user is also wrong, the word is recognized as a voice new word.

3. The automatic construction method of a multi-source information fusion thesaurus according to claim 1, wherein the multi-level thesaurus in S22 is divided according to a general rule forgotten by memory and can be used as a basis for personalized memory rule research, and S22 further comprises the following steps:

s221, automatically storing the first recognized new words into a new word bank needing to be reviewed in the shortest time;

s222, the new word completes correct memory in the memory cycle and is automatically transferred to a new word bank with a longer memory cycle at the next level;

s223, the new word is not completed to be memorized correctly in the memory cycle, and is automatically transferred to a new word bank with a shorter memory cycle at the previous stage;

s224, deleting the new words in the new word bank with the longest memory period from the new word bank after the new words in the new word bank with the longest memory period are correctly memorized in the memory period.

Technical Field

The invention relates to the technical field of computer application, in particular to an automatic construction method of a multi-source information fusion new word stock.

Background

Internationalization is an important feature of social development. In daily life and work, people increasingly need to communicate in foreign languages. The foreign language learning method is an effective means for improving foreign language ability, and a plurality of software tools for assisting the memory of the foreign language learning are available in the market, and the defects of the tools are mainly expressed in the following points:

(1) the words in the raw word library firstly need to be judged by the user to be the raw words, and then the updating of the raw word library is realized in a manual confirmation mode, so that the improvement of the learning efficiency is influenced;

(2) the understanding of the new words is limited to unknown or unfamiliar words, the new words are memorized, the reading capability is improved, and the listening and speaking capability is not directly promoted;

(3) the new words are reviewed according to the common group law of the Ebinghaos memory forgetting curve, and the individual difference of the memory forgetting speed is ignored.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide an automatic construction method of a multi-source information fusion new word library, which automatically identifies and reads new words according to user operation information; automatically recognizing speech word generation according to information such as the mouth shape and the speech of the user; and automatically and dynamically updating the multilevel word stock according to the memory forgetting rule.

The purpose of the invention is realized by the following technical scheme:

a method for automatically constructing a multi-source information fusion word stock, wherein the word stock comprises two types of reading word stocks and voice word stocks, and the word stock is divided into two multi-level word stock with short-term memory and long-term memory, and the method comprises the following steps:

s1, automatically recognizing new words;

s11, automatically recognizing and reading new words according to the user operation information;

s12, automatically recognizing the speech new word according to the mouth shape and the speech information of the user;

s2, automatically updating the new word stock;

s21, automatically storing the recognized reading new words and the recognized voice new words into a reading new word bank and a voice new word bank respectively;

and S22, automatically and dynamically updating the multilevel thesaurus.

Preferably, the S12 further includes the following steps:

Preferably, the multistage thesaurus in S22 is divided according to a general rule of memory forgetting, and can be used as a basis for personalized memory rule research, and S22 further includes the following steps:

s221, automatically storing the first recognized new words into a new word bank needing to be reviewed in the shortest time;

s222, the new word completes correct memory in the memory cycle and is automatically transferred to a new word bank with a longer memory cycle at the next level;

s223, the new word is not completed to be memorized correctly in the memory cycle, and is automatically transferred to a new word bank with a shorter memory cycle at the previous stage;

Compared with the prior art, the invention has the following beneficial effects:

(1) the invention fuses information such as user operation, mouth shape, voice, memory and the like, performs automatic identification of the new words and dynamic update of the new word library, realizes the full automation of the new word library construction process, and can improve the learning efficiency compared with the prior mode that a user firstly judges by himself and then confirms manually;

(2) the invention divides the new words into reading new words and pronunciation new words, and can simultaneously promote the improvement of reading ability and listening and speaking level compared with the prior mode of limiting to memorize unknown or unfamiliar words;

(3) the invention divides the word stock into two multilevel word stocks of short-term memory and long-term memory, and provides a means for the research of the personalized memory law.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of a multi-source information acquisition device according to the present invention;

FIG. 3 is a schematic view of an automatic new word recognition process according to the present invention;

FIG. 4 is a schematic diagram of an automatic recognition interface for reading new words according to the present invention;

FIG. 5 is a schematic diagram of an automatic speech utterance recognition interface according to the present invention;

FIG. 6 is a diagram illustrating the structure of the new word stock according to the present invention;

FIG. 7 is a schematic diagram of an automatic update process of the new word stock according to the present invention;

FIG. 8 is a flow chart illustrating the automatic and dynamic update of the multilevel thesaurus according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

The invention provides an automatic construction method of a new word stock of multi-source information fusion, which comprises two processes of automatic new word identification and automatic new word stock updating; the method is oriented to two purposes of document reading and voice communication, and the new words are divided into reading new words and voice new words; dividing the word generation library into two multi-level word generation libraries of short-term memory and long-term memory according to a memory forgetting rule; the method automatically identifies and reads new words according to user operation information; automatically recognizing speech word generation according to information such as the mouth shape and the speech of the user; and automatically and dynamically updating the multilevel word stock according to the memory forgetting rule.

Specifically, as shown in fig. 1 to 8, an automatic construction method for a multi-source information fusion word generation library includes reading word generation and voice word generation, the word generation library is divided into two multi-level word generation libraries of short-term memory and long-term memory, and the method includes the following steps:

step one, automatically identifying new words.

(1) And automatically recognizing and reading new words according to the user operation information.

The user operation information comprises information of operation reaction time and whether the operation is correct, and if the user does not perform the operation or performs the operation error within the specified time, the word is identified as a new reading word.

(2) And automatically recognizing the speech new words according to the mouth shape and the speech information of the user.

If the voice recognition equipment does not receive the voice of the user or receives wrong voice of the user within the specified time, but the mouth shape recognition equipment recognizes that the pronunciation mouth shape of the user is correct, prompting the user to read again; if the speech recognition device does not receive the user speech or receives the wrong user speech within a specified time, and meanwhile, the mouth shape recognition device recognizes that the pronunciation mouth shape of the user is also wrong, the word is recognized as a speech new word.

And step two, automatically updating the new word library.

(1) The recognized reading new words and the recognized voice new words are respectively and automatically stored in a reading new word bank and a voice new word bank.

(2) And automatically and dynamically updating the multilevel word stock.

The multilevel word stock is divided according to a general rule forgotten by memory and can be used as the basis for researching personalized memory rules, wherein:

automatically storing the first recognized new words into a new word bank needing to be reviewed in the shortest time; the new words are correctly memorized in the memory cycle and automatically transferred to a new word bank with a longer memory cycle at the next level; the new words are not correctly memorized in the memory cycle, and are automatically transferred to a new word bank with a shorter memory cycle at the previous stage; and deleting the new words in the new word bank with the longest memory period from the new word bank after the new words in the new word bank with the longest memory period are correctly memorized in the memory period.

As shown in fig. 1, the embodiment provides an automatic construction method 100 of a multi-source information fused new word library, which includes two processes of automatic new word recognition 300 and automatic new word library update 700.

The multi-source information comprises information such as user operation, mouth shape, voice, memory and the like. As shown in fig. 2, the multi-source information acquisition apparatus 200 includes, but is not limited to, a camera 201, a touch screen 202, a speaker 203, a microphone 204, and software and hardware systems for computation and data storage. The camera 201 is used for acquiring mouth shape information, the touch screen 202 is used for acquiring user operation information, and the microphone 204 is used for acquiring voice information. The memory information is gradually forgotten over time and follows the general rule of fast first and slow last. The multi-level word generation libraries with different memory periods can record the personalized memory information of the user, and an effective way is provided for analyzing the personalized memory characteristics and forming the personalized memory rule.

The invention aims at two purposes of document reading and voice communication, and divides the new words into reading new words and voice new words. As shown in fig. 3, the automatic new word recognition 300 includes: automatically identifying and reading new words 301 according to the user operation information; and automatically recognizing the speech new words 302 according to the information of the mouth shape, the speech and the like of the user.

As shown in fig. 4, the interface 400 for obtaining information for reading automatic identification of new words includes a word 401, an option one 403, an option two 404, an option three 405, and an option four 406. The relationships between the four options, words 401 and 403, 404, 405, 406, resemble a single choice question, with one and only one option being correct. When word 401 is Chinese, the four options 403, 404, 405, 406 are foreign languages; when word 401 is foreign, the four options 403, 404, 405, 406 are Chinese. Word 401 is randomly extracted from the word library and displayed on the screen with the corresponding four options appearing at the locations indicated by the four options 403, 404, 405, 406, respectively. The interface 400 may further include a prompt 402 for displaying user operation information such as whether the selection is correct, the duration used, and the like; the associated reminder information may also be played through the speaker 203. When a word 401 appears on a screen, if a user makes a correct selection from four options of 403, 404, 405 and 406 within a specified time, prompting information such as correct selection and used time length appears 402; if the user has not made a selection or selection error within a specified time, a prompt occurs 402 for a selection error or timeout and word 401 is identified as being read new.

The invention does not limit the specified time of the user selection operation, and can be a certain time duration within 10 seconds or a time duration obtained by calculation according to the personalized memory characteristics of the user.

The interface 400 may also include a game character 407 that overcomes the tediousness of word memory with the entertainment of the game. The user makes 3 consecutive correct selections and the game piece 407 starts to dance; otherwise, the game character 407 stops dancing.

As shown in fig. 5, the interface 500 for obtaining information for automatic recognition of speech new words includes a word 401. Words 401 are randomly extracted from the word library and displayed on the screen. The interface 500 may further include a prompt 402 for displaying information such as whether the pronunciation is correct, whether the pronunciation needs to be read again, the duration of use, etc.; the associated reminder information may also be played through the speaker 203. If the microphone 204 receives the correct word reading voice of the user within the specified time, prompt information such as correct pronunciation, used time length and the like appears 402; if the microphone 204 does not receive the user voice or receives the wrong user voice within the specified time, but the camera 201 recognizes that the pronunciation mouth shape of the user is correct, 402 presents a prompt message of reading again, and the timing is restarted; if the microphone 204 does not receive the user voice or receives the wrong user voice within the specified time, and meanwhile, the camera 201 recognizes that the pronunciation mouth shape of the user is also wrong, a prompt message of pronunciation error or overtime appears 402, and the word 401 is recognized as a speech new word.

The invention does not limit the specified time for the microphone 204 to receive the correct word reading voice of the user, and the specified time can be a certain time within 10 seconds or a time obtained by calculation according to the personalized memory characteristics of the user.

The interface 500 may also include a game character 407 that overcomes the tediousness of word memory with the entertainment of the game. The user makes 3 consecutive correct pronunciations and the game character 407 starts dancing; otherwise, the game character 407 stops dancing.

The invention does not limit the voice and mouth shape recognition method, and can judge whether the pronunciation of the word and the pronunciation mouth shape are correct or not by comparing the voice frequency and the mouth shape by using an artificial intelligence method.

As shown in fig. 6, the vocabulary library 600 includes a reading vocabulary library 601 and a speech vocabulary library 602. Since the retention of memory is different in time, there are two types of short-term memory and long-term memory. The input information becomes the short-term memory of the person after the learning of the attention process of the person, but if the input information is not reviewed in time, the memory is forgotten, and after the review in time, the short-term memory becomes a long-term memory of the person and is stored in the brain for a long time. Therefore, the present invention subdivides 601 and 602 into a multi-level short-term memory word bank 603 and a multi-level long-term memory word bank 604.

The invention does not limit the level of the multilevel word library, and according to the general law of memory forgetting, the short-time memory multilevel word library can comprise a 5-minute word library, a 30-minute word library and a 12-hour word library; the long-term memory multi-level living word library may include a 1-native word library, a 2-native word library, a 4-native word library, a 7-native word library, and a 15-native word library.

As shown in fig. 7, the automatic new word stock update 700 includes: the recognized reading new words and the recognized voice new words are respectively and automatically stored in a reading new word bank and a voice new word bank 701; and automatically and dynamically updating 800 the multilevel word stock.

As shown in FIG. 8, the automatic dynamic update 800 of the multilevel thesaurus comprises: the initial recognized new words are automatically stored in a new word library 801 which needs to be reviewed in the shortest time; the new word completes correct memory in the memory cycle, and automatically transfers to the new word library 802 with longer memory cycle of the next level; the new words are not correctly memorized in the memory cycle, and automatically transferred to the new word bank 803 with shorter memory cycle of the previous stage, and the new words in the new word bank with the longest memory cycle are deleted 804 from the new word bank after the correct memory in the memory cycle is finished. For example, the first recognized new word is automatically stored in a 5-minute new word bank, the storage time is t, and if the user completes correct memory within t +5 minutes, the new word is transferred from the 5-minute new word bank to a 30-minute new word bank; if the user completes correct memory within t +30 minutes, the new word is transferred from the 30-minute new word bank to a 12-hour new word bank; otherwise, the new word is adjusted from the 30-minute new word bank back to the 5-minute new word bank. Generally speaking, when a new word is memorized for 8 times in total within 15 days after the 3-level short-term memory word library and the 5-level long-term memory word library, the new word is no longer a new word for the user, and is finally deleted from the 15-day word library.

The invention further explains the automatic new word identification and automatic new word bank updating process of the automatic new word bank construction method for multi-source information fusion, and does not relate to a method for acquiring the personalized memory law of a user. However, the word generation libraries are divided into two multi-level word generation libraries of short-term memory and long-term memory, and support can be provided for the research of the personalized memory law of the user. For example: and automatically storing the initial recognized new words into a 5-minute new word bank at the moment t. If the user A does not memorize the new word within t +5 minutes, the new word is kept in the 5-minute new word bank; if user A remembers within [ t +5, t +30] minutes, but is incorrect, the new word remains in the 5 minute word stock. If the user B does not memorize the new word within t +5 minutes, the new word is kept in the 5-minute new word bank; if the user B correctly memorizes the new words within the range of [ t +5, t +30] minutes, the new words are directly transferred from the 5-minute new word bank to the 12-hour new word bank. If the probability of the above situation occurring is high enough, it can be concluded that: user B has a better short-term memory than user a. When the user A correctly memorizes the new words within the range of t +5 minutes, the memorized new words are transferred from the 5-minute new word bank to the 30-minute new word bank; if the user B has performed correct memory within the range of t +5 minutes, the memorized new words can be directly transferred from the 5-minute new word bank to the 12-hour new word bank. Therefore, the difference of the personalized memory rules is finally embodied in the dynamic updating of the multilevel word stock, and the learning efficiency can be further improved.

The invention fuses information such as user operation, mouth shape, voice, memory and the like, performs automatic identification of the new words and dynamic update of the new word library, realizes the full automation of the new word library construction process, and can improve the learning efficiency compared with the prior mode that a user firstly judges by himself and then confirms manually; the new words are divided into reading new words and voice new words, and compared with the prior mode of limiting to memorize unknown or unfamiliar words, the method can simultaneously play a role in promoting the improvement of reading capability and listening and speaking level; the word generation library is divided into two multilevel word generation libraries of short-term memory and long-term memory, and a means is provided for the research of the personalized memory law.

The present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents and are included in the scope of the present invention.

13页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：基于五定技术的语音监控在线考试方法及其装置

Automatic construction method of multi-source information fusion new word library

相关技术

网友询问留言