Method, device and equipment for identifying wrongly-distinguished words and phrases and computer-readable storage medium

文档序号:1889900 发布日期:2021-11-26 浏览:8次 中文

阅读说明:本技术 错别字词的识别方法、装置、设备及计算机可读存储介质 (Method, device and equipment for identifying wrongly-distinguished words and phrases and computer-readable storage medium ) 是由 杨韬 于 2021-04-06 设计创作,主要内容包括:本申请提供了一种错别字词的识别方法、装置、设备、计算机可读存储介质及模型训练方法;其中,错别字词的识别方法包括:获取待识别的语句,以及所述语句中待识别字词的编码特征;利用已训练的语言模型,对所述语句中的待识别字词进行特征提取,得到所述待识别字词在所述语句中的上下文特征;利用已训练的深度神经网络模型,对所述语句中待识别字词的编码特征和上下文特征进行识别,得到表征所述待识别字词是否错误的识别结果。本申请中,能够在进行错别字词识别时,利用语言模型泛化性强的特点以及深度神经网络模型识别准确率高的特性,更好地识别出语句中的错别字词,且不需要人工设定和维护大量的模型融合规则,可以减少人工成本。(The application provides a method, a device and equipment for identifying wrongly-distinguished words, a computer-readable storage medium and a model training method; the method for identifying the wrongly-distinguished words comprises the following steps: acquiring a sentence to be recognized and coding characteristics of words to be recognized in the sentence; performing feature extraction on the words to be recognized in the sentences by using the trained language model to obtain the context features of the words to be recognized in the sentences; and identifying the coding features and the context features of the words to be identified in the sentences by using the trained deep neural network model to obtain an identification result representing whether the words to be identified are wrong or not. In the application, when wrongly-written or mispronounced words are recognized, the wrongly-written or mispronounced words in the sentences can be recognized better by utilizing the characteristics of strong generalization of the language model and the characteristic of high recognition accuracy of the deep neural network model, a large number of model fusion rules do not need to be manually set and maintained, and labor cost can be reduced.)

1. A method for identifying a misclassified word, comprising:

acquiring a sentence to be recognized and coding characteristics of words to be recognized in the sentence;

performing feature extraction on the words to be recognized in the sentences by using the trained language model to obtain the context features of the words to be recognized in the sentences;

and identifying the coding features and the context features of the words to be identified in the sentences by using the trained deep neural network model to obtain an identification result representing whether the words to be identified are wrong or not.

2. The method of claim 1, wherein the language model comprises a bidirectional language model, the context feature comprises an implicit feature, and the performing feature extraction on the word to be recognized in the sentence by using the trained language model to obtain the context feature of the word to be recognized in the sentence comprises:

determining, using the trained bi-directional language model, a first vector that characterizes the first N words of the word to be recognized in the case of forward prediction and a second vector that characterizes the last M words of the word to be recognized in the case of reverse prediction, where N and M are integers greater than or equal to 0 and less than the length of the sentence;

and obtaining the implicit characteristics of the words to be recognized in the sentence based on the first vector and the second vector.

3. The method according to claim 1 or 2, wherein the context features include explicit features, and the performing feature extraction on the word to be recognized in the sentence by using the trained language model to obtain the context features of the word to be recognized in the sentence comprises:

predicting the words to be recognized in the sentences by using the trained language model to obtain probability parameters representing whether the words to be recognized are wrong or not;

and determining the explicit characteristics of the word to be recognized in the sentence based on the probability parameters.

4. The method of claim 3, wherein the probability parameters comprise a current word probability and a maximum candidate word probability, and wherein determining the explicit characteristics of the word to be recognized in the sentence based on the probability parameters comprises:

discretizing the difference between the current word probability and the maximum candidate word probability to obtain a discretization difference;

determining a first target value interval to which the discretization difference value belongs from a preset first value interval list;

and determining the explicit characteristics of the words to be recognized in the sentence based on the first target value-taking interval.

5. The method according to claim 3, wherein the probability parameter includes at least one confusion reduction ratio of the word to be recognized, and the predicting the word to be recognized in the sentence by using the trained language model to obtain the probability parameter representing whether the word to be recognized is wrong comprises:

determining the current confusion degree of the sentence under the condition that the position of the word to be recognized is the current word by using the trained language model, and replacing the position with the post-replacement confusion degree of the sentence under the condition that each candidate of the wrongly-written word candidate set of the word to be recognized is wrongly-written word;

determining at least one confusion reduction ratio of the word to be recognized based on the current confusion and the at least one post-replacement confusion of the word to be recognized.

6. The method of claim 5, wherein the determining the explicit feature of the word to be recognized in the sentence based on the probability parameter comprises:

discretizing at least one confusion degree drop ratio of the words to be recognized to obtain a discretized confusion degree drop ratio;

determining a second target value interval to which the discretized confusion degree reduction ratio belongs from a preset second value interval list;

and determining the explicit characteristics of the words to be recognized in the sentence based on the second target value-taking interval.

7. A method of model training, the method comprising:

acquiring each sample statement in a marked sample set and the coding characteristics of words to be recognized in each sample statement; each sample statement in the labeled sample set is provided with a label sequence used for labeling whether a word to be identified in the sample statement is a wrongly written word or not;

for each sample sentence, performing feature extraction on the words to be recognized in the sample sentence by using the trained language model to obtain context features of the words to be recognized in the sample sentence, and recognizing the coding features and the context features of the words to be recognized in the sample sentence by using the deep neural network model to obtain a recognition result representing whether the words to be recognized in the sample sentence are wrong or not;

determining a loss value by using a loss function based on the identification result corresponding to each sample statement and the label sequence of each sample statement;

and updating the parameters of the deep neural network model based on a parameter optimization algorithm under the condition that the loss function is determined not to be converged according to the loss value.

8. An apparatus for identifying a misclassified word, comprising:

the first acquisition module is used for acquiring sentences to be recognized and coding characteristics of words to be recognized in the sentences;

the first extraction module is used for extracting the features of the words to be recognized in the sentences by utilizing the trained language model to obtain the context features of the words to be recognized in the sentences;

and the first recognition module is used for recognizing the coding features and the context features of the words to be recognized in the sentences by using the trained deep neural network model to obtain a recognition result representing whether the words to be recognized are wrong or not.

9. A mismatching word recognition apparatus, comprising:

a memory for storing executable instructions;

a processor for implementing the method of any one of claims 1 to 7 when executing executable instructions stored in the memory.

10. A computer-readable storage medium having stored thereon executable instructions for, when executed by a processor, implementing the method of any one of claims 1 to 7.

Technical Field

The present application relates to, but not limited to, the field of artificial intelligence, and in particular, to a method, an apparatus, a device, a computer-readable storage medium, and a model training method for identifying a wrongly-recognized word.

Background

In today's self-media age, creators on various content distribution platforms have created millions of articles for recommendation to hundreds of millions of users each day. In a large number of articles generated every day, some wrongly written words are often included, which greatly affects the reading experience of the user. Compared with English, Chinese expressions are richer and more diverse, and grammar syntax structures are more flexible and changeable, so that the reasons for errors are more different, such as: shallow spelling errors such as homophones, nearsighted, and geometric errors; there are also deep grammatical mistakes, such as preposition misuse, improper collocation, etc.; there are more difficult logical errors to capture, such as a mix of "will" and "be" words. This greatly increases the difficulty of chinese error correction, making the problem faced by chinese error correction more challenging. Therefore, how to quickly and accurately identify wrongly written or mispronounced words in an article is an important but challenging task.

The method for identifying wrongly-written characters in the related art mainly comprises the steps of identifying wrongly-written characters based on a language model and identifying wrongly-written characters based on a deep neural network model. The wrongly-written character recognition method based on the language model has better generalization and lower recognition accuracy, and compared with the wrongly-written character recognition method based on the language model, the wrongly-written character recognition method based on the deep neural network model has higher recognition accuracy and relatively poorer generalization. In the related art, the two models are usually used for performing wrongly written character recognition on a text respectively, and then recognition results of the two models are fused based on a series of rules to obtain a final recognition result. However, the fusion of the two models is a manual shallow fusion method, and a large number of thresholds are required to be set and a large number of rules are required to be adjusted, so that the system is difficult to maintain and further optimize.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment, a computer readable storage medium and a model training method for identifying wrongly-recognized words, which can better identify the wrongly-recognized words in sentences by utilizing the characteristics of strong generalization of a language model and the characteristic of high identification accuracy of a deep neural network model when identifying the wrongly-recognized words, and can identify the wrongly-recognized words from end to end without manually setting and maintaining a large number of model fusion rules on the basis of the deep neural network model by deeply fusing the characteristics of the language model, so that the labor cost can be greatly reduced.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a method for identifying wrongly-distinguished words, which comprises the following steps:

acquiring a sentence to be recognized and coding characteristics of words to be recognized in the sentence;

performing feature extraction on the words to be recognized in the sentences by using the trained language model to obtain the context features of the words to be recognized in the sentences;

and identifying the coding features and the context features of the words to be identified in the sentences by using the trained deep neural network model to obtain an identification result representing whether the words to be identified are wrong or not.

In some embodiments, the language model includes a bidirectional language model, the context feature includes an implicit feature, and performing feature extraction on the word to be recognized in the sentence by using the trained language model to obtain the context feature of the word to be recognized in the sentence includes: determining, using the trained bi-directional language model, a first vector that characterizes the first N words of the word to be recognized in the case of forward prediction and a second vector that characterizes the last M words of the word to be recognized in the case of reverse prediction, where N and M are integers greater than or equal to 0 and less than the length of the sentence; and obtaining the implicit characteristics of the words to be recognized in the sentence based on the first vector and the second vector.

In some embodiments, the context feature includes an explicit feature, and the performing feature extraction on the word to be recognized in the sentence by using the trained language model to obtain the context feature of the word to be recognized in the sentence includes: predicting the words to be recognized in the sentences by using the trained language model to obtain probability parameters representing whether the words to be recognized are wrong or not; and determining the explicit characteristics of the word to be recognized in the sentence based on the probability parameters.

In some embodiments, the probability parameters include a current word probability and a maximum candidate word probability, and the determining the explicit feature of the word to be recognized in the sentence based on the probability parameters includes: discretizing the difference between the current word probability and the maximum candidate word probability to obtain a discretization difference; determining a first target value interval to which the discretization difference value belongs from a preset first value interval list; and determining the explicit characteristics of the words to be recognized in the sentence based on the first target value-taking interval.

In some embodiments, the predicting the word to be recognized in the sentence by using the trained language model to obtain the probability parameter representing whether the word to be recognized is wrong includes: determining the current confusion degree of the sentence under the condition that the position of the word to be recognized is the current word by using the trained language model, and replacing the position with the post-replacement confusion degree of the sentence under the condition that each candidate of the wrongly-written word candidate set of the word to be recognized is wrongly-written word; determining at least one confusion reduction ratio of the word to be recognized based on the current confusion and the at least one post-replacement confusion of the word to be recognized.

In some embodiments, the determining, based on the probability parameter, an explicit feature of the word to be recognized in the sentence includes: discretizing at least one confusion degree drop ratio of the words to be recognized to obtain a discretized confusion degree drop ratio; determining a second target value interval to which the discretized confusion degree reduction ratio belongs from a preset second value interval list; and determining the explicit characteristics of the words to be recognized in the sentence based on the second target value-taking interval.

In some embodiments, the recognizing, by using the trained deep neural network model, the coding feature and the context feature of the word to be recognized in the sentence to obtain a recognition result representing whether the word to be recognized is incorrect includes: combining the coding features and the context features of the words to be recognized to obtain a fusion vector of the words to be recognized; and identifying the fusion vector of the words to be identified in the sentence by using the trained deep neural network model to obtain an identification result representing whether the words to be identified are wrong or not.

In some embodiments, before the recognizing, by using the trained deep neural network model, the coding feature and the context feature of the word to be recognized in the sentence to obtain a recognition result representing whether the word to be recognized is incorrect, the method further includes: acquiring each sample statement in the labeled sample set and the coding characteristics of words to be recognized in each sample statement; each sample statement in the labeled sample set is provided with a label sequence used for labeling whether a word to be identified in the sample statement is a wrongly written word or not; for each sample sentence, performing feature extraction on the words to be recognized in the sample sentence by using the trained language model to obtain context features of the words to be recognized in the sample sentence, and recognizing the coding features and the context features of the words to be recognized in the sample sentence by using the deep neural network model to obtain a recognition result representing whether the words to be recognized in the sample sentence are wrong or not; determining a loss value by using a loss function based on the identification result corresponding to each sample statement and the label sequence of each sample statement; and updating the parameters of the deep neural network model based on a parameter optimization algorithm under the condition that the loss function is determined not to be converged according to the loss value.

The embodiment of the application provides a model training method, which comprises the following steps:

acquiring each sample statement in a marked sample set and the coding characteristics of words to be recognized in each sample statement; each sample statement in the labeled sample set is provided with a label sequence used for labeling whether a word to be identified in the sample statement is a wrongly written word or not;

for each sample sentence, performing feature extraction on the words to be recognized in the sample sentence by using the trained language model to obtain context features of the words to be recognized in the sample sentence, and recognizing the coding features and the context features of the words to be recognized in the sample sentence by using the deep neural network model to obtain a recognition result representing whether the words to be recognized in the sample sentence are wrong or not;

determining a loss value by using a loss function based on the identification result corresponding to each sample statement and the label sequence of each sample statement;

and updating the parameters of the deep neural network model based on a parameter optimization algorithm under the condition that the loss function is determined not to be converged according to the loss value.

The embodiment of the present application provides a device for identifying a wrongly-recognized word, including:

the first acquisition module is used for acquiring sentences to be recognized and coding characteristics of words to be recognized in the sentences;

the first extraction module is used for extracting the features of the words to be recognized in the sentences by utilizing the trained language model to obtain the context features of the words to be recognized in the sentences;

and the first recognition module is used for recognizing the coding features and the context features of the words to be recognized in the sentences by using the trained deep neural network model to obtain a recognition result representing whether the words to be recognized are wrong or not.

In some embodiments, the language model comprises a bi-directional language model, the contextual features comprise implicit features, the first extraction module is further to: determining, using the trained bi-directional language model, a first vector that characterizes the first N words of the word to be recognized in the case of forward prediction and a second vector that characterizes the last M words of the word to be recognized in the case of reverse prediction, where N and M are integers greater than or equal to 0 and less than the length of the sentence; and obtaining the implicit characteristics of the words to be recognized in the sentence based on the first vector and the second vector.

In some embodiments, the contextual features include explicit features, the first extraction module further to: predicting the words to be recognized in the sentences by using the trained language model to obtain probability parameters representing whether the words to be recognized are wrong or not; and determining the explicit characteristics of the word to be recognized in the sentence based on the probability parameters.

In some embodiments, the probability parameters include a current word probability and a maximum candidate word probability, the first extraction module is further configured to: discretizing the difference between the current word probability and the maximum candidate word probability to obtain a discretization difference; determining a first target value interval to which the discretization difference value belongs from a preset first value interval list; and determining the explicit characteristics of the words to be recognized in the sentence based on the first target value-taking interval.

In some embodiments, the probability parameter comprises at least one confusion reduction ratio of the word to be recognized, and the first extraction module is further configured to: determining the current confusion degree of the sentence under the condition that the position of the word to be recognized is the current word by using the trained language model, and replacing the position with the post-replacement confusion degree of the sentence under the condition that each candidate of the wrongly-written word candidate set of the word to be recognized is wrongly-written word; determining at least one confusion reduction ratio of the word to be recognized based on the current confusion and the at least one post-replacement confusion of the word to be recognized.

In some embodiments, the first extraction module is further to: discretizing at least one confusion degree drop ratio of the words to be recognized to obtain a discretized confusion degree drop ratio; determining a second target value interval to which the discretized confusion degree reduction ratio belongs from a preset second value interval list; and determining the explicit characteristics of the words to be recognized in the sentence based on the second target value-taking interval.

In some embodiments, the first identification module is further configured to: combining the coding features and the context features of the words to be recognized to obtain a fusion vector of the words to be recognized; and identifying the fusion vector of the words to be identified in the sentence by using the trained deep neural network model to obtain an identification result representing whether the words to be identified are wrong or not.

In some embodiments, the apparatus further comprises: the second obtaining module is used for obtaining each sample statement in the labeled sample set and the coding characteristics of the words to be recognized in each sample statement; each sample statement in the labeled sample set is provided with a label sequence used for labeling whether a word to be identified in the sample statement is a wrongly written word or not; the second recognition module is used for performing feature extraction on the words to be recognized in the sample sentences by using the trained language model aiming at each sample sentence to obtain the context features of the words to be recognized in the sample sentences, and recognizing the coding features and the context features of the words to be recognized in the sample sentences by using the deep neural network model to obtain a recognition result representing whether the words to be recognized in the sample sentences are wrong or not; the first determining module is used for determining a loss value by using a loss function based on the identification result corresponding to each sample statement and the label sequence of each sample statement; and the first updating module is used for updating the parameters of the deep neural network model based on a parameter optimization algorithm under the condition that the loss function is determined not to be converged according to the loss value.

The embodiment of the application provides a device of model training, includes:

the third acquisition module is used for acquiring each sample statement in the labeled sample set and the coding characteristics of the words to be recognized in each sample statement; each sample statement in the labeled sample set is provided with a label sequence used for labeling whether a word to be identified in the sample statement is a wrongly written word or not;

the third recognition module is used for performing feature extraction on the words to be recognized in the sample sentences by using the trained language model aiming at each sample sentence to obtain the context features of the words to be recognized in the sample sentences, and recognizing the coding features and the context features of the words to be recognized in the sample sentences by using the deep neural network model to obtain a recognition result representing whether the words to be recognized in the sample sentences are wrong or not;

the second determining module is used for determining a loss value by using a loss function based on the identification result corresponding to each sample statement and the label sequence of each sample statement;

and the second updating module is used for updating the parameters of the deep neural network model based on a parameter optimization algorithm under the condition that the loss function is determined not to be converged according to the loss value.

The embodiment of the present application provides a device for identifying a wrongly-recognized word, including: a memory for storing executable instructions; and the processor is used for realizing the method for identifying the wrongly-recognized words provided by the embodiment of the application when the executable instructions stored in the memory are executed.

The embodiment of the application provides a model training device, including: a memory for storing executable instructions; and the processor is used for realizing the model training method provided by the embodiment of the application when the executable instructions stored in the memory are executed.

Embodiments of the present application provide a computer-readable storage medium, which stores executable instructions for causing a processor to implement the method provided by the embodiments of the present application when the processor executes the executable instructions.

The embodiment of the application has the following beneficial effects:

firstly, obtaining a sentence to be recognized and coding characteristics of words to be recognized in the sentence; secondly, performing feature extraction on the words to be recognized in the sentences by using the trained language model to obtain the context features of the words to be recognized in the sentences; and finally, identifying the coding features and the context features of the words to be identified in the sentences by using the trained deep neural network model to obtain an identification result representing whether the words to be identified are wrong or not. Therefore, when the wrongly-written or mispronounced words are recognized, the characteristic of strong generalization of the language model can be utilized, and the characteristic of high recognition accuracy of the deep neural network model can be utilized, so that the wrongly-written or mispronounced words in the sentence can be better recognized. And on the basis of the deep neural network model, the characteristics of the language model are deeply fused, end-to-end identification of wrongly-distinguished words is carried out, and a large number of model fusion rules do not need to be set and maintained manually, so that the labor cost can be greatly reduced.

Drawings

Fig. 1 is an alternative architecture diagram of a system for recognizing wrongly written or mispronounced words according to an embodiment of the present application;

fig. 2A is an alternative structural diagram of a device for identifying a misdistinguished word provided in the embodiment of the present application;

FIG. 2B is a schematic diagram of an alternative structure of a model training apparatus according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart of an alternative method for identifying a misdistinguished word according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of an alternative method for identifying a misrecognized word provided by the embodiment of the present application;

FIG. 5 is a schematic flow chart of an alternative method for identifying a misdistinguished word according to the embodiment of the present application;

FIG. 6 is a schematic flow chart of an alternative method for identifying a misdistinguished word according to the embodiment of the present application;

FIG. 7 is a schematic flow chart illustrating an alternative method for identifying a misdistinguished word according to an embodiment of the present application;

FIG. 8 is a schematic flow chart diagram illustrating an alternative method for model training provided by embodiments of the present application;

fig. 9A is an overall architecture diagram of a deep neural network-based wrongly-written character recognition model fusing a bidirectional neural language model according to an embodiment of the present application;

fig. 9B is a schematic diagram illustrating a composition architecture of a bi-directional neural language model according to an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Where similar language of "first/second" appears in the specification, the following description is added, and where reference is made to the term "first \ second \ third" merely to distinguish between similar items and not to imply a particular ordering with respect to the items, it is to be understood that "first \ second \ third" may be interchanged with a particular sequence or order as permitted, to enable the embodiments of the application described herein to be performed in an order other than that illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

In order to better understand the method for identifying a wrongly written word provided in the embodiment of the present application, a description will be given below of a wrongly written word identification scheme in the related art.

In the related art, there are two schemes for identifying wrongly written words:

1) a wrongly written character recognition method based on a language model. The method relies on a language model trained on large-scale corpus, and generally comprises two language models: one is a statistical language model based on a multi-source language model (N-gram), and the other is a neural network language model based on deep learning. After the language model is trained, whether the sentence is a reasonable natural language expression or not can be judged by calculating a Perplexity (PPL) value representing the fluency of the sentence for the sentence. Specifically, a sentence is input, a plurality of candidate words (including homophone word candidates, nearsighted word candidates, shape candidates and the like) are substituted word by word, the PPL score reduction ratio before and after the sentence is substituted is calculated, and if the PPL score reduction ratio is larger than a certain threshold value, the word or the word is a wrongly-written word.

2) A wrongly written character recognition method based on a deep neural network model is disclosed. The method mainly adopts a sequence labeling model to predict whether each word in an input statement is a wrongly written word. Commonly used sequence labeling models based on a deep neural network mainly include a Bi-directional Long Short Term Memory (Bi-LSTM) model, a Long Short Term Memory (LSTM) model combined with a Conditional Random Field (CRF), and a Bidirectional encoding with a converter (BERT) model. The BERT-based deep neural network model comprehensively updates the optimal indexes on a plurality of natural language processing tasks, so that the BERT-based sequence labeling model is used for a wrongly written character recognition task more frequently.

Among the wrongly written characters recognition methods of the related art, the wrongly written characters recognition method based on the language model has the advantages of high calculation speed and simple realization, and has better generalization but low wrongly written character recognition accuracy because of being trained by a large amount of unsupervised texts; the wrongly written character recognition method based on the deep neural network model trains the model by using targeted labeled training data, the recognition accuracy rate of wrongly written characters is higher than that of a language model, but because the deep neural network model is trained on a small amount of labeled training data, the generalization is poorer than that of the language model trained by a large amount of texts.

The embodiment of the application provides a method, a device, equipment and a computer readable storage medium for identifying wrongly-recognized words, which can better identify the wrongly-recognized words in sentences by utilizing the characteristics of strong generalization of a neural language model and the high accuracy of a deep neural network identification model when identifying wrongly-recognized words, and can greatly reduce the labor cost because the characteristics of the deep fusion neural language model are used for identifying the wrongly-recognized words from end to end on the basis of the deep neural network model without manually setting and maintaining a large number of model fusion rules. The following describes an exemplary application of the recognition device for the wrong words provided in the embodiments of the present application, and both the recognition device for the wrong words and the model training device provided in the embodiments of the present application may be implemented as various types of user terminals such as a notebook computer, a tablet computer, a desktop computer, a car navigation device, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, and a portable game device), and may also be implemented as a server. In the following, an exemplary application will be explained when the recognition device of the misrecognized word is implemented as a server.

Referring to fig. 1, fig. 1 is an alternative architecture diagram of a system 100 for recognizing wrongly written or mispronounced words provided in this embodiment of the present application, which may implement recognition of wrongly written or mispronounced words of a sentence to be recognized, where terminals (a terminal 400-1 and a terminal 400-2 are exemplarily shown) are connected to a server 200 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of both.

The terminal is used for: displaying an interactive interface for performing sentence mispronounced word recognition by a user on a graphical interface (the graphical interface 410-1 and the graphical interface 410-2 are exemplarily shown), receiving the mispronounced word recognition operation of the user for the sentence to be recognized, and sending the sentence to be recognized to the server 200.

The server 200 is configured to: acquiring a sentence to be recognized and coding characteristics of words to be recognized in the sentence; performing feature extraction on the words to be recognized in the sentences by using the trained language model to obtain the context features of the words to be recognized in the sentences; and identifying the coding features and the context features of the words to be identified in the sentences by using the trained deep neural network model to obtain an identification result representing whether the words to be identified are wrong or not.

In addition, the system for recognizing wrongly written words according To the embodiment of the present application may also be a distributed system applied To a blockchain system, where the distributed system may be a distributed node formed by a plurality of nodes and clients, the nodes may be any type of computing device in an access network, such as a server, a user terminal, and the like, a Peer-To-Peer (P2P) network is formed between the nodes, and a recognition device for wrongly written words implemented as a server may be a node on a blockchain.

In some embodiments, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal 400 may be, but is not limited to, an automatic map data collection vehicle, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present invention.

Referring to fig. 2A, fig. 2A is a schematic structural diagram of a recognition apparatus 200 for a misclassified word according to an embodiment of the present application, where the recognition apparatus 200 for a misclassified word shown in fig. 2A includes: at least one processor 210, memory 250, at least one network interface 220, and a user interface 230. The various components of the recognition device 200 of mistyped words are coupled together by a bus system 240. It is understood that the bus system 240 is used to enable communications among the components. The bus system 240 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 240 in fig. 2A.

The Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 230 includes one or more output devices 231, including one or more speakers and/or one or more visual display screens, that enable the presentation of media content. The user interface 230 also includes one or more input devices 232, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 250 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 250 optionally includes one or more storage devices physically located remotely from processor 210.

The memory 250 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The non-volatile memory may be a Read Only Memory (ROM) and the volatile memory may be a Random Access Memory (RAM). The memory 250 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 250 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 251 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 252 for communicating to other computing devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a presentation module 253 to enable presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 231 (e.g., a display screen, speakers, etc.) associated with the user interface 230;

an input processing module 254 for detecting one or more user inputs or interactions from one of the one or more input devices 232 and translating the detected inputs or interactions.

In some embodiments, the identification apparatus for a wrong word provided by the embodiment of the present application may be implemented in software, and fig. 2A illustrates the identification apparatus 255 for a wrong word stored in the storage 250, which may be software in the form of programs and plug-ins, and includes the following software modules: a first obtaining module 2551, a first extracting module 2552 and a first identifying module 2553, which are logical and thus can be arbitrarily combined or further split according to the implemented functions.

The functions of the respective modules will be explained below.

In other embodiments, the Device for identifying a misappropriated word provided in the embodiments of the present Application may be implemented in hardware, and as an example, the Device for identifying a misappropriated word provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the method for identifying a misappropriated word provided in the embodiments of the present Application, for example, the processor in the form of a hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

Referring to fig. 2B, fig. 2B is a schematic structural diagram of a model training apparatus 300 according to an embodiment of the present application, where the model training apparatus 300 shown in fig. 2B includes: at least one processor 310, memory 350, at least one network interface 320, and a user interface 330. The various components in the model training apparatus 300 are coupled together by a bus system 340. It will be appreciated that the bus system 340 is used to enable communications among the components connected. The bus system 340 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 340 in fig. 2B.

The Processor 310 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 330 includes one or more output devices 331, including one or more speakers and/or one or more visual display screens, that enable presentation of media content. The user interface 330 also includes one or more input devices 332, including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 350 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 350 optionally includes one or more storage devices physically located remote from processor 310.

The memory 350 may include either volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 350 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 350 is capable of storing data, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below, to support various operations.

An operating system 351 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 352 for communicating to other computing devices via one or more (wired or wireless) network interfaces 320, exemplary network interfaces 320 including: bluetooth, WiFi, USB, and the like;

a presentation module 353 for enabling presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 331 (e.g., a display screen, speakers, etc.) associated with the user interface 330;

an input processing module 354 for detecting one or more user inputs or interactions from one of the one or more input devices 332 and translating the detected inputs or interactions.

In some embodiments, the model training apparatus provided in the embodiments of the present application may be implemented in software, and fig. 2B illustrates a model training apparatus 355 stored in a memory 350, which may be software in the form of programs and plug-ins, and includes the following software modules: the third obtaining module 3551, the third identifying module 3552, the second determining module 3553, and the second updating module 3554 are logical and thus may be arbitrarily combined or further separated depending on the functionality implemented.

The functions of the respective modules will be explained below.

In other embodiments, the model training apparatus provided in the embodiments of the present application may be implemented in hardware, and for example, the model training apparatus provided in the embodiments of the present application may be a processor in the form of a hardware decoding processor, which is programmed to perform the model training method provided in the embodiments of the present application, for example, the processor in the form of the hardware decoding processor may employ one or more ASICs, DSPs, PLDs, complex CPLDs, FPGAs, or other electronic components.

The method for identifying a wrong word provided by the embodiment of the present application will be described below with reference to an exemplary application and implementation of the terminal or the server provided by the embodiment of the present application.

Referring to fig. 3, fig. 3 is an alternative flowchart of the method for identifying a misrecognized word provided in the embodiment of the present application, and will be described below with reference to the steps shown in fig. 3, where an execution subject of the following steps may be the foregoing terminal or server.

In step S101, a sentence to be recognized and a coding feature of a word to be recognized in the sentence are obtained.

Here, the sentence to be recognized is a sentence that needs to be recognized as a wrongly written word, and may include, but is not limited to, one or more of chinese sentences, english sentences, and the like. The word to be recognized may be a word in the sentence that needs to be recognized as a wrongly-written word, and each word in the sentence may be a single word in the sentence, such as a single chinese character, a single word, or the like, or may be a phrase in the sentence, such as a chinese character word, a word phrase, or the like. The sentence to be recognized may include at least one word to be recognized. In implementation, the sentence to be recognized may be a sentence input by a user or other systems, or may be obtained from a preset text, which is not limited herein. The words to be recognized in the sentence can be all the words in the sentence, or can be partial words screened from the sentence based on a preset word screening strategy. In implementation, words to be recognized can be screened out according to the part of speech of words in a sentence, for example, the probability that prepositions or discourse words in a common sentence have wrongly-recognized words is low, so that wrongly-recognized words can be recognized only for words except prepositions and discourse words in the sentence; the words to be recognized can also be screened out according to the positions of the words in the sentence, for example, if the probability of wrongly recognized words appearing in the words at the beginning and the end of the sentence is low, the wrongly recognized words can be recognized only for the words at the middle position in the sentence.

The coding characteristics of the words to be recognized are the coding information which can uniquely represent the words and is corresponding to the words to be recognized, and can be represented by any suitable mode such as vectors, matrixes, numerical identifiers and the like. In implementation, the encoding characteristics of the words to be recognized in the sentence may be obtained by querying a preset encoding characteristic table, such as a word vector table, and the like, or the encoding characteristics of the words to be recognized may be obtained by encoding or vectorizing the words to be recognized in the sentence by using a preset encoding rule, a mapping algorithm, and the like, which is not limited herein. In some embodiments, the word to be recognized may also be processed through a preset natural language processing model, so as to obtain the coding feature of the word to be recognized, for example, a word vector of the word to be recognized may be generated based on the BERT model, as the coding feature of the word to be recognized.

In step S102, feature extraction is performed on the words to be recognized in the sentence by using the trained language model, so as to obtain context features of the words to be recognized in the sentence.

Here, the language model may be any suitable model capable of representing a probability distribution of a natural language text in a corpus for determining whether a sentence is normally described, and the language model may be trained in advance in a suitable training manner. In practice, the language model may be a statistical language model, such as a unitary language model (Uni-gram) based on an N-gram, a binary language model (Bi-gram), a ternary language model (Tri-gram), etc., or a neural language model, such as a probabilistic feedforward neural network language model, a recurrent neural network language model, a language model based on a converter structure, a language model based on an LSTM structure, a language model based on a convolutional neural network, etc. The language model may be a unidirectional language model or a bidirectional language model. The person skilled in the art can select a suitable language model according to practical situations in implementation, and is not limited herein.

In some embodiments, the trained language model may be trained based on an unlabeled corpus. And training the language model by using a large amount of texts in the unmarked prediction library, wherein the obtained trained language model has good generalization.

The contextual characteristics of the word to be recognized in the sentence may include any suitable characteristics related to the context of the word to be recognized in the sentence, and may include, but are not limited to, one or more of a characterization vector of the word to be recognized in the context of the sentence, a probability of occurrence of the word to be recognized in the context of the sentence, contextual information corresponding to the sentence to be recognized, a part-of-speech characteristic of the word to be recognized in the contextual context of the sentence, and the like. The language model can realize the extraction of the context characteristics of the words to be recognized in the sentences by processing the representation vectors of the contexts of the words to be recognized in the sentences or predicting the probability distribution of the current words to be recognized or the confusion degree of the whole sentences. In implementation, a person skilled in the art may determine an appropriate context feature and an extraction manner of the context feature according to actual situations, and is not limited herein.

In step S103, using the trained deep neural network model to identify the coding features and the context features of the words to be identified in the sentence, so as to obtain an identification result indicating whether the words to be identified are incorrect.

Here, the deep neural network model may be any suitable classification model, and may be trained in advance in a suitable training manner. In implementation, the deep Neural Network model may be a sequence labeling model based on a deep Neural Network, and may include, but is not limited to, one or more of a Bi-LSTM model, an LSTM-CRF model, a BERT model, and the like, and may also be one or more of other Neural Network models, such as a Convolutional Neural Network (CNN) model, a Recurrent Neural Network (RNN) model, and the like. The person skilled in the art can select a suitable language model according to practical situations in implementation, and is not limited herein.

In some embodiments, the trained deep neural network model may be trained based on a preset set of labeled samples. The labeled sample set can comprise a plurality of sample sentences, and each sample sentence in the labeled sample set is provided with a label sequence for labeling whether the word to be identified in the sample sentence is a wrongly written word or not. And training the deep neural network model by using the labeled sample set training, wherein the obtained trained language model has higher recognition accuracy of wrongly-written words.

The deep neural network model can identify whether the words to be identified are wrong or not by classifying the coding features and the context features of the words to be identified in the sentences. The recognition result can comprise the probability that the word to be recognized is wrong and the probability that the word to be recognized is correct, and whether the word to be recognized is wrong or not can be determined based on the probability that the word to be recognized is wrong and the probability that the word to be recognized is correct. The recognition result may also include a result in which the word to be recognized is wrong or a result in which the word to be recognized is correct.

In the embodiment of the application, firstly, a sentence to be recognized and coding characteristics of words to be recognized in the sentence are obtained; secondly, performing feature extraction on the words to be recognized in the sentences by using the trained language model to obtain the context features of the words to be recognized in the sentences; and finally, identifying the coding features and the context features of the words to be identified in the sentences by using the trained deep neural network model to obtain an identification result representing whether the words to be identified are wrong or not. Therefore, when the wrongly-written or mispronounced words are recognized, the characteristic of strong generalization of the language model can be utilized, and the characteristic of high recognition accuracy of the deep neural network model can be utilized, so that the wrongly-written or mispronounced words in the sentence can be better recognized. And on the basis of the deep neural network model, the characteristics of the language model are deeply fused, end-to-end identification of wrongly-distinguished words is carried out, and a large number of model fusion rules do not need to be set and maintained manually, so that the labor cost can be greatly reduced.

In some embodiments, referring to fig. 4, fig. 4 is an optional flowchart of the method for identifying a misrecognized word provided in this embodiment of the present application, based on fig. 3, where the language model includes a bidirectional language model, and the context feature includes an implicit feature, step S102 shown in fig. 3 may be implemented by steps S401 to S402, which will be described below with reference to the steps, and an execution subject of the steps may be the foregoing terminal or server.

In step S401, using the trained bi-directional language model, a first vector characterizing the first N words of the word to be recognized in the case of forward prediction and a second vector characterizing the last M words of the word to be recognized in the case of reverse prediction are determined, where N and M are both integers greater than or equal to 0 and smaller than the length of the sentence.

Here, the bi-directional language model may be any suitable model that can perform probability prediction from front to back (i.e., forward prediction) and from back to front (i.e., reverse prediction) on the words in the sentence. When performing forward prediction on a word to be recognized, the bidirectional language model may determine a characterization vector of N words before the word to be recognized in a sentence to be recognized, that is, a first vector of the first N words characterizing the word to be recognized, and when performing reverse prediction on the word to be recognized, the bidirectional language model may determine a characterization vector of M words after the word to be recognized, that is, a second vector characterizing the last M words of the word to be recognized, in the sentence to be recognized.

In some embodiments, the bi-directional language model may be determined from two unidirectional language models, i.e., may include one forward predicted language model and one reverse predicted language model. When the forward predicted language model predicts the word to be recognized, a first vector representing the first N words of the word to be recognized can be determined, and when the reverse predicted language model predicts the word to be recognized, a second vector representing the last M words of the word to be recognized can be determined.

In step S402, based on the first vector and the second vector, obtaining an implicit feature of the word to be recognized in the sentence.

Here, the implicit characteristic of the word to be recognized in the sentence may include a characterization vector of the word to be recognized in the context of the sentence. In implementation, the first vector and the second vector may be merged to obtain the implicit characteristic of the word to be recognized in the sentence. In practice, the manner of combining the first vector and the second vector may include, but is not limited to, one or more of adding, splicing, and the like.

In the embodiment of the application, a trained bidirectional language model is utilized to determine a first vector representing the first N words of a word to be recognized in the case of forward prediction and a second vector representing the last M words of the word to be recognized in the case of reverse prediction, where N and M are integers which are greater than or equal to 0 and smaller than the length of a sentence, and an implicit characteristic of the word to be recognized in the sentence is obtained based on the first vector and the second vector. Therefore, the implicit characteristic of the word to be recognized in the sentence can reflect the upper information and the lower information of the word to be recognized in the sentence at the same time, and the accuracy of the recognition of the wrongly-written or mispronounced word can be further improved based on the implicit characteristic.

In some embodiments, referring to fig. 5, fig. 5 is an optional flowchart of the method for identifying a misrecognized word provided in this embodiment of the present application, based on fig. 3, where the context feature includes an explicit feature, step S102 shown in fig. 3 may be implemented by steps S501 to S502, which will be described below with reference to the steps, and an execution subject of the following steps may be the foregoing terminal or server.

In step S501, the trained language model is used to predict the word to be recognized in the sentence, so as to obtain a probability parameter representing whether the word to be recognized is incorrect.

Here, the language model may predict the probability distribution of the word to be recognized in the sentence, and obtain a probability parameter representing whether the word to be recognized is wrong. The probability parameter may include, but is not limited to, one or more of a probability that a position of the word to be recognized in the sentence is the word to be recognized, a probability of the possible word and each possible word at the position, a parameter indicating fluency of the sentence when the position is the word to be recognized, and the like. Those skilled in the art can determine suitable probability parameters according to practical situations, and the probability parameters are not limited herein.

In step S502, based on the probability parameter, an explicit feature of the word to be recognized in the sentence is determined.

Here, the explicit characteristics of the word to be recognized in the sentence may include any suitable characteristics related to the predicted probability distribution of the word to be recognized in the sentence. For example, the explicit feature of the word to be recognized in the sentence may include a probability that a position of the word to be recognized in the sentence is the word to be recognized, a difference or a ratio between a probability of the word with the highest probability at the position and a probability of the word to be recognized at the position, a PPL value of the sentence when the position is the word to be recognized, a PPL reduction ratio of the sentence when the position is replaced with another word, and the like, and may also include a preset feature vector corresponding to a probability parameter representing whether the word to be recognized is wrong, and the like.

In some embodiments, the probability parameter may be determined directly as an explicit feature of the word to be recognized in the sentence. In some embodiments, the probability parameter may also be mapped to a preset feature vector, which is determined as an explicit feature of the word to be recognized in the sentence.

In some embodiments, the probability parameters include a current word probability and a maximum candidate word probability, and the step S502 may be implemented as the following steps S511 to S513:

step S511, discretizing the difference between the current word probability and the maximum candidate word probability to obtain a discretization difference;

here, the current word probability is the probability that the position where the current word is located in the sentence predicted by the language model is the current word. The maximum candidate word probability is the probability corresponding to the word with the maximum probability in the possible words at the position of the current word in the sentence predicted by the language model.

In implementation, any suitable discretization method may be adopted to discretize the difference between the current word probability and the maximum candidate word probability, the difference may be discretized on average, or the difference may be discretized unequally, which is not limited herein. The discretization interval may be determined according to actual conditions, and may be discretization on an [0,1] interval or discretization on an [0,100] interval, for example.

Step S512, determining a first target value interval to which the discretization difference value belongs from a preset first value interval list;

here, the first value interval list may include a plurality of preset value intervals, and may be determined in a discretization manner according to a difference between the current word probability and the maximum candidate word probability. The discretization difference obtained after discretization of the difference belongs to a unique value interval, and the value interval is the first target value interval.

Step S513, determining the explicit feature of the word to be recognized in the sentence based on the first target value-taking interval.

Each value interval in the first value interval list may correspond to a preset feature, and after the first target value interval is determined, the feature corresponding to the first target value interval may be determined as a display feature of the word to be recognized in the sentence.

In some embodiments, the probability parameter includes at least one confusion rate reduction ratio of the word to be recognized, and the step S501 may be implemented by the following steps S521 to S522:

step S521, determining the current confusion degree of the sentence under the condition that the position of the word to be recognized is the current word and the post-replacement confusion degree of the sentence under the condition that the position of the word to be recognized is replaced by each candidate different word in the different word candidate set of the word to be recognized respectively by using the trained language model;

here, the wrongly written character candidate set of the words to be recognized includes words that are easily confused or erroneous with wrongly written characters to be recognized, for example, characters that are close in shape, homophones, and the like of the words to be recognized. In implementation, the candidate set of wrongly written words of the words to be recognized may be determined based on an open-source dictionary library, or may be summarized in advance based on human experience, which is not limited herein.

The wrongly written candidate set of words to be recognized may comprise at least one candidate wrongly written word. For each candidate wrongly-recognized word, the position of the word to be recognized in the sentence can be replaced by the candidate wrongly-recognized word, and the confusion degree after replacement is calculated to obtain the confusion degree after replacement.

Step S522, determining at least one confusion reduction ratio of the word to be recognized based on the current confusion and the at least one post-replacement confusion of the word to be recognized.

Here, the confusion reduction ratio may be a difference between the current confusion and the confusion after replacement, or may be a ratio between the current confusion and the confusion after replacement, and is not limited herein. The confusion reduction ratio can reflect the possibility of whether the word to be recognized in the sentence is wrong, and the larger the confusion reduction ratio is, the higher the possibility of the word to be recognized in the sentence is wrong.

In some embodiments, the step S502 can be implemented by the following steps S531 to S533: step S531, discretizing at least one confusion degree drop ratio of the words to be recognized to obtain a discretized confusion degree drop ratio; step S532, determining a second target value interval to which the discretized confusion rate reduction ratio belongs from a preset second value interval list; step S533, determining the explicit feature of the word to be recognized in the sentence based on the second target value-taking interval. Here, the embodiments of steps S531 to S533 are similar to the embodiments of steps S511 to S513, and when implemented, reference may be made to specific embodiments of steps S511 to S513.

In some embodiments, the context feature includes an explicit feature and an implicit feature, and the step S102 may be implemented by the following steps S541 to S544: step S541, determining, by using the trained bidirectional language model, a first vector for characterizing the first N words of the word to be recognized in the case of forward prediction, and a second vector for characterizing the last M words of the word to be recognized in the case of reverse prediction, where N and M are integers greater than or equal to 0 and smaller than the length of the sentence; step S542, obtaining implicit characteristics of the words to be recognized in the sentence based on the first vector and the second vector; step S543, predicting the words to be recognized in the sentence by using the trained language model to obtain probability parameters representing whether the words to be recognized are wrong; step S544, based on the probability parameter, determining the explicit characteristics of the word to be recognized in the sentence.

In some embodiments, the probability parameters include a current word probability, a maximum candidate word probability, and at least one confusion reduction ratio of the word to be recognized, the explicit features include a first explicit feature and a second explicit feature, and the step S102 can be implemented by the following steps S551 to S556: step S551, discretizing the difference between the current word probability and the maximum candidate word probability to obtain a discretization difference; step S552, determining a first target value interval to which the discretization difference value belongs from a preset first value interval list; step S553, determining an explicit feature of the word to be recognized in the sentence based on the first target value-taking interval; step S554, discretizing at least one confusion degree reduction ratio of the words to be recognized to obtain a discretized confusion degree reduction ratio; step 555, determining a second target value interval to which the discretized confusion rate reduction ratio belongs from a preset second value interval list; step S556, determining the explicit feature of the word to be recognized in the sentence based on the second target value-taking interval.

In the embodiment of the application, the trained language model is used for predicting the words to be recognized in the sentence to obtain the probability parameter representing whether the words to be recognized are wrong, and the explicit characteristics of the words to be recognized in the sentence are determined based on the probability parameter. In this way, the display characteristics in the sentence of the word to be recognized can be determined based on the prediction result of the language model on the word to be recognized in the sentence. The obtained display characteristics can better reflect the context of the words to be recognized in the sentences, so that the recognition accuracy of wrongly-written words can be further improved, and the generalization of the recognition of wrongly-written words can be effectively improved. Further, the context feature can simultaneously comprise an explicit feature and an implicit feature, so that the expression capability of the context feature on the context of the word to be recognized in the sentence can be further improved.

In some embodiments, referring to fig. 6, fig. 6 is an optional flowchart of the method for identifying a wrongly-recognized word provided in this embodiment of the present application, based on fig. 3, the step S103 may be implemented by the following steps S601 to S602, which will be described below with reference to the steps, and an execution subject of the following steps may be the foregoing terminal or server.

In step S601, combining the coding features and the context features of the words to be recognized to obtain a fusion vector of the words to be recognized;

here, the coding feature and the context feature of the word to be recognized may be combined by any suitable method, which is not limited herein. For example, the coding features and the context features may be added to obtain a fusion vector of the word to be recognized; the coding features and the context features can also be spliced to obtain a fusion vector of the words to be recognized.

In step S602, using the trained deep neural network model to identify the fusion vector of the word to be identified in the sentence, so as to obtain an identification result indicating whether the word to be identified is incorrect.

Here, the deep neural network model may identify whether the word to be recognized is erroneous by classifying the fusion features of the word to be recognized in the sentence.

In the embodiment of the application, the coding features and the context features of the words to be recognized are combined to obtain the fusion vector of the words to be recognized, and the fusion vector of the words to be recognized in the sentence is recognized by using the trained deep neural network model to obtain the recognition result representing whether the words to be recognized are wrong or not. Therefore, the coding characteristics and the context characteristics of the words to be recognized can be fused to recognize the wrongly recognized words, and the recognition accuracy of the wrongly recognized words can be further improved.

In some embodiments, referring to fig. 7, fig. 7 is an optional flowchart of the method for identifying a misrecognized word provided in this embodiment of the present application, based on fig. 3, the following steps S701 to S704 may be further performed before the step S103, which will be described below with reference to the steps, and a main body of the following steps may be the foregoing terminal or server.

In step S701, obtaining each sample sentence in the labeled sample set and the coding characteristics of the word to be recognized in each sample sentence; and each sample statement in the labeled sample set is provided with a label sequence used for labeling whether the word to be identified in the sample statement is a wrongly written word or not.

Here, the labeled sample set may be labeled manually or automatically in advance, and the labeled sample set may include at least one sample statement. The sample sentence can comprise at least one word to be recognized, and the label sequence of the sample sentence can comprise a label corresponding to each word to be recognized and used for marking whether the word to be recognized which wins the gambling is the wrongly written word or not.

In step S702, for each sample sentence, the trained language model is used to perform feature extraction on the word to be recognized in the sample sentence, so as to obtain the context feature of the word to be recognized in the sample sentence, and the deep neural network model is used to recognize the coding feature and the context feature of the word to be recognized in the sample sentence, so as to obtain a recognition result representing whether the word to be recognized in the sample sentence is incorrect.

Here, step S702 corresponds to step S102 and step S103, and when implemented, reference may be made to specific embodiments of step S102 and step S103.

Step S703 is to determine a loss value by using a loss function based on the recognition result corresponding to each sample statement and the tag sequence of each sample statement.

Here, any suitable loss function may be employed to determine the loss value, such as an absolute value loss function, a squared loss function, a cross entropy loss function, an exponential loss function, and so forth.

Step S704, under the condition that the loss function is determined not to be converged according to the loss value, updating the parameters of the deep neural network model based on a parameter optimization algorithm.

Here, the parameter optimization algorithm may be any suitable algorithm, such as a gradient descent method, a conjugate gradient method, a newton algorithm, and the like.

It should be noted that the steps S701 to S704 are not limited to the execution sequence shown in fig. 7, and for example, the steps S701 to S704 may be executed before the step S101.

In some embodiments, after step S704, the method may further include: step S705, for each sample sentence, performing feature extraction on the word to be recognized in the sample sentence by using the trained language model to obtain the context feature of the word to be recognized in the sample sentence, and recognizing the coding feature and the context feature of the word to be recognized in the sample sentence by using the deep neural network model after parameter updating to obtain an updated recognition result representing whether the word to be recognized in the sample sentence is wrong; step S706, determining an updated loss value by using a loss function based on the updated identification result corresponding to each sample statement and the label sequence of each sample statement; and step S707, under the condition that the loss function is determined not to be converged according to the updated loss value, updating the parameters of the deep neural network model based on a parameter optimization algorithm.

In some embodiments, after step S704, the method may further include: step S708, determining the current deep neural network model as the trained deep neural network model under the condition that the loss function is determined to be converged according to the loss value or under the condition that the number of times of updating the parameter reaches a preset number threshold.

In the embodiment of the application, a preset labeled sample set is adopted, and the characteristics of the trained language model are fused to train the deep neural network model, so that end-to-end learning is realized. Therefore, the deep neural network model obtained through training can utilize the characteristic of strong generalization of the language model and the characteristic of high recognition accuracy of the deep neural network model, so that the wrongly-recognized words in the sentence can be better recognized. Moreover, on the basis of the deep neural network model, the characteristics of the deep fusion language model are subjected to end-to-end learning, and a large number of model fusion rules do not need to be set and maintained manually, so that the labor cost can be greatly reduced

The model training method provided by the embodiment of the present application will be described below with reference to an exemplary application and implementation of a terminal or a server provided by the embodiment of the present application.

Referring to fig. 8, fig. 8 is an alternative flowchart of a model training method provided in an embodiment of the present application, which will be described below with reference to the steps shown in fig. 8, where the execution subject of the following steps may be the foregoing terminal or server.

In step S801, obtaining each sample sentence in the labeled sample set and the coding feature of the word to be recognized in each sample sentence; each sample statement in the labeled sample set is provided with a label sequence used for labeling whether a word to be identified in the sample statement is a wrongly written word or not;

in step S802, for each sample sentence, performing feature extraction on the word to be recognized in the sample sentence by using the trained language model to obtain a context feature of the word to be recognized in the sample sentence, and recognizing the coding feature and the context feature of the word to be recognized in the sample sentence by using the deep neural network model to obtain a recognition result representing whether the word to be recognized in the sample sentence is wrong;

in step S803, a loss value is determined by using a loss function based on the recognition result corresponding to each sample sentence and the tag sequence of each sample sentence;

in step S804, in the case that it is determined that the loss function does not converge according to the loss value, the parameters of the deep neural network model are updated based on a parameter optimization algorithm.

In some embodiments, after step S804, the method may further include: step S805, for each sample sentence, performing feature extraction on the word to be recognized in the sample sentence by using the trained language model to obtain a context feature of the word to be recognized in the sample sentence, and recognizing the coding feature and the context feature of the word to be recognized in the sample sentence by using the deep neural network model after parameter update to obtain an updated recognition result representing whether the word to be recognized in the sample sentence is wrong; step S806, determining an updated loss value by using a loss function based on the updated recognition result corresponding to each sample statement and the tag sequence of each sample statement; step S807, updating the parameters of the deep neural network model based on a parameter optimization algorithm under the condition that the loss function is determined not to be converged according to the updated loss value.

In some embodiments, after step S804, the method may further include: and step S808, determining the current deep neural network model as the trained deep neural network model under the condition that the loss function is determined to be converged according to the loss value or under the condition that the updating times of the parameters reach a preset time threshold.

It should be noted that steps S801 to S808 correspond to steps S701 to S708, respectively, and in the implementation, reference may be made to specific embodiments of steps S701 to S708.

In the embodiment of the application, a preset labeled sample set is adopted, and the characteristics of the trained language model are fused to train the deep neural network model, so that end-to-end learning is realized. Therefore, the deep neural network model obtained through training can utilize the characteristic of strong generalization of the language model and the characteristic of high recognition accuracy of the deep neural network model, so that the wrongly-recognized words in the sentence can be better recognized. And on the basis of the deep neural network model, the characteristics of the language model are deeply fused, end-to-end learning is carried out, and a large number of model fusion rules do not need to be set and maintained manually, so that the labor cost can be greatly reduced.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described. The method for identifying the wrongly-written words provided by the embodiment of the application can be applied to a scene of identifying the wrongly-written words of a single sentence or article. For example, wrongly written characters in articles can be automatically identified and prompted on information flow products such as a daily quick report and a QQ (QQ) viewpoint, manual review and identification of wrongly written characters are assisted, and low-quality articles with too many wrongly written characters or poor wrongly written characters are intercepted to enter a recommendation pool; the penguin-shaped text sending assistant can assist the user in sending texts from the media author, help the user to check suspected wrongly written characters in the articles before sending texts, and prompt correction information, so that the user is helped to reduce or avoid wrongly written characters in the articles.

In practical application scenarios, the types of wrongly written words are numerous, and for example, the chinese wrongly written word recognition is taken as an example, see table 1 below, where the types and examples of common chinese wrongly written words are listed in table 1, which may include homophonic errors, neartonic errors, form-nearing errors, and other errors, and the examples of wrongly written words of each error type are exemplarily shown, and the correct word is given in parentheses after the corresponding wrongly written word position:

TABLE 1 common error types of wrongly written Chinese characters

In view of this, the embodiment of the present application provides a deep neural network-based wrongly written or mispronounced word recognition model that merges a bidirectional neural language model, which deeply merges features of a neural language model on the basis of a conventional deep neural network-based wrongly written or mispronounced word recognition model to perform end-to-end learning. Because a great amount of neural language models trained by texts are deeply utilized, the wrongly written characters in the article can be better recognized by the wrongly written character recognition model based on the deep neural network. By means of the deep fusion mode, the two models are fused into one model, the characteristic that a neural language model trained by a large amount of texts is high in generalization is utilized, the characteristic that the deep neural network is high in accuracy is kept, and the model can better recognize wrongly written characters in an article.

Referring to fig. 9A, fig. 9A is a schematic overall architecture diagram of a deep neural network-based wrongly-written-word recognition model fusing a bidirectional neural language model according to an embodiment of the present application. As shown in fig. 9A, the model includes a converter-based bidirectional language model 910 and a BERT sequence labeling model 920, where the input of the model is a sentence, and the output is a sequence labeling result output by the BERT sequence labeling model 920, and the output sequence labeling result includes a label of whether each word in the input sentence is a wrongly-written word.

The process of performing wrongly written character recognition on an input sentence using the converter-based bi-directional language model 910 and the BERT sequence labeling model 920 includes the following steps S901 to S903:

step S901, inputting an input sentence into a bidirectional Language Model 910 based on a converter, and obtaining two types of feature vectors for each word in the sentence, where one type is an intermediate representation vector obtained in a process of predicting by a Language Model (LM), and is called an implicit feature vector, and the other type is a feature vector determined based on a prediction result of the LM, and is called an explicit feature vector;

here, the input sentence can be spliced into a character sequence according to the following rules, and then the two-way language model is input:

[CLS]Char1 Char2...CharL[SEP];

wherein [ CLS ] and [ SEP ] are predefined marker symbols, Char1, Char2 and CharL are respectively each word appearing in the input sentence in sequence, and L is the total word number of the sentence.

Since the unidirectional neural language model can only use a unilateral context, the embodiment of the present application provides a bidirectional neural language model. Referring to fig. 9B, fig. 9B is a schematic diagram illustrating a composition architecture of a bidirectional neural language model provided in an embodiment of the present application, where the model is composed of two unidirectional LM models, including an LM model 911 for performing prediction from front to back and an LM model 912 for performing prediction from back to front. When a word at the current position in an input sentence is predicted, the LM model 911 may determine a feature vector of a word before the current word in the process of predicting from the front to the back, the LM model 912 may determine a feature vector of a word after the current word in the process of predicting from the back to the front, and the feature vector of the word before the current word determined by the LM model 911 and the feature vector of the word after the current word determined by the LM model 912 are spliced to obtain an implicit feature vector of the current word, and the word at the current position may be predicted based on the implicit feature vector. For example, assuming that the input sentence is "let citizens get a sense of happiness", the word at the position of "xin" is predicted by splicing the token vector of the previous word "feel" and the token vector of the next word "Fu".

The bidirectional neural language model provided by the embodiment of the application can be obtained based on the training of a preset unmarked corpus. When the bidirectional neural language model is trained, the unidirectional model from front to back and the unidirectional model from back to front are not trained respectively, but the two models are trained together. Therefore, the context of the two complete directions can be simultaneously utilized by the bidirectional neural language model in the training process, and the implicit characteristic vector of each word in the sentence can be obtained only by carrying out model prediction on the input sentence once.

The explicit feature vector for each word in the input sentence may include the probability of the top1 word predicted by the bi-directional language model at the current position, the probability of the current word, and the feature vector corresponding to the discretization of the difference between the probability of the top1 word and the probability of the current word. Here, the probability of top1 word is the probability of the word with the highest probability of the current position predicted by the bi-directional language model, and the probability of the current word is the probability of the current word with the current position predicted by the bi-directional language model. the feature vector corresponding to the discretization of the difference value between the probability of top1 word and the probability of current word can be obtained as follows: determining the difference value between the probability of the top1 word and the probability of the current word, and then averagely discretizing the difference value into a plurality of preset buckets (for example, 20 buckets), wherein each bucket corresponds to one feature vector, and the feature vector corresponding to the bucket into which the difference value is discretized can be used as the feature vector corresponding to the discretization of the difference value between the probability of the top1 word at the current position and the probability of the current word.

The explicit feature vector for each word in the input sentence may also include the PPL drop predicted by the bi-directional language model at the current location versus the corresponding feature vector. Here, for each word in the input sentence, the PPL reduction ratio of the sentence before and after each replacement may be calculated after each candidate wrongly-written word in the wrongly-written word candidate set corresponding to the word is replaced. For each PPL drop ratio, the PPL drop ratio may be discretized averagely into a plurality of preset buckets (e.g., 20 buckets), each bucket corresponds to one feature vector, and the feature vector corresponding to the bucket into which the PPL drop ratio falls after discretization may be used as the feature vector corresponding to the PPL drop ratio predicted by the current location. In practice, the PPL of a sentence may be calculated using the following equation 1-1:

where s represents the sentence to be calculated, n is the length of the sentence, w1w2…wnRespectively representing the word at each position in the sentence s, P (w)1w2…wn) Representing a word in the sentence s as w1w2…wnProbability of time, p (w)i|w1w2…wi-1) The words indicated at the 1 st to i-1 st positions are w1w2…wi-1When at position i is wiProbability of time. The smaller PPL(s), the greater the probability that the sentence s will have wrongly written words.

Step S902, aiming at each word in the statement, adding the two types of vectors obtained by the LM and the original BERT word vector corresponding to the word together to obtain a vector after addition;

here, the original BERT word vector corresponding to each word may be obtained by encoding each word by any suitable vectorization encoding method, or by querying a specific word vector table.

Step S903, inputting the summed vector corresponding to each word into a BERT sequence labeling model 920 to obtain a sequence labeling result including the labeling of whether each word in the input sentence is a wrongly written word.

The converter-based bi-directional language model can be obtained by training a large amount of unmarked corpora in advance. The BERT sequence labeling model 920 may be based on a set of pre-labeled training samples and trained using features of a trained converter-based bi-directional language model. When training the BERT sequence labeling model, the sum of cross entropies at each position in the input sentence can be used as a Loss value, for example, the Loss value Loss can be calculated by using the following formula 1-2:

where n is the length of the input sentence, PkThe probability distribution of whether the input sentence is a wrongly written word at the position k.

In some embodiments, the bidirectional language model may be replaced by a BERT model, and the BERT sequence labeling model may be replaced by other Neural Network models, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and the like.

Continuing with the exemplary structure of the identification device 255 for a wrong word provided in the embodiment of the present application as a software module, in some embodiments, as shown in fig. 2A, the software module stored in the identification device 255 for a wrong word in the memory 250 may include:

a first obtaining module 2551, configured to obtain a sentence to be recognized and a coding feature of a word to be recognized in the sentence;

a first extraction module 2552, configured to perform feature extraction on the words to be recognized in the sentence by using the trained language model, so as to obtain context features of the words to be recognized in the sentence;

the first identification module 2553 is configured to identify, by using the trained deep neural network model, the coding features and the context features of the words to be identified in the sentence, so as to obtain an identification result indicating whether the words to be identified are incorrect.

In some embodiments, the language model comprises a bi-directional language model, the contextual features comprise implicit features, the first extraction module is further to: determining, using the trained bi-directional language model, a first vector that characterizes the first N words of the word to be recognized in the case of forward prediction and a second vector that characterizes the last M words of the word to be recognized in the case of reverse prediction, where N and M are integers greater than or equal to 0 and less than the length of the sentence; and obtaining the implicit characteristics of the words to be recognized in the sentence based on the first vector and the second vector.

In some embodiments, the contextual features include explicit features, the first extraction module further to: predicting the words to be recognized in the sentences by using the trained language model to obtain probability parameters representing whether the words to be recognized are wrong or not; and determining the explicit characteristics of the word to be recognized in the sentence based on the probability parameters.

In some embodiments, the probability parameters include a current word probability and a maximum candidate word probability, the first extraction module is further configured to: discretizing the difference between the current word probability and the maximum candidate word probability to obtain a discretization difference; determining a first target value interval to which the discretization difference value belongs from a preset first value interval list; and determining the explicit characteristics of the words to be recognized in the sentence based on the first target value-taking interval.

In some embodiments, the probability parameter comprises at least one confusion reduction ratio of the word to be recognized, and the first extraction module is further configured to: determining the current confusion degree of the sentence under the condition that the position of the word to be recognized is the current word by using the trained language model, and replacing the position with the post-replacement confusion degree of the sentence under the condition that each candidate of the wrongly-written word candidate set of the word to be recognized is wrongly-written word; determining at least one confusion reduction ratio of the word to be recognized based on the current confusion and the at least one post-replacement confusion of the word to be recognized.

In some embodiments, the first extraction module is further to: discretizing at least one confusion degree drop ratio of the words to be recognized to obtain a discretized confusion degree drop ratio; determining a second target value interval to which the discretized confusion degree reduction ratio belongs from a preset second value interval list; and determining the explicit characteristics of the words to be recognized in the sentence based on the second target value-taking interval.

In some embodiments, the first identification module is further configured to: combining the coding features and the context features of the words to be recognized to obtain a fusion vector of the words to be recognized; and identifying the fusion vector of the words to be identified in the sentence by using the trained deep neural network model to obtain an identification result representing whether the words to be identified are wrong or not.

In some embodiments, the apparatus further comprises: the second obtaining module is used for obtaining each sample statement in the labeled sample set and the coding characteristics of the words to be recognized in each sample statement; each sample statement in the labeled sample set is provided with a label sequence used for labeling whether a word to be identified in the sample statement is a wrongly written word or not; the second recognition module is used for performing feature extraction on the words to be recognized in the sample sentences by using the trained language model aiming at each sample sentence to obtain the context features of the words to be recognized in the sample sentences, and recognizing the coding features and the context features of the words to be recognized in the sample sentences by using the deep neural network model to obtain a recognition result representing whether the words to be recognized in the sample sentences are wrong or not; the first determining module is used for determining a loss value by using a loss function based on the identification result corresponding to each sample statement and the label sequence of each sample statement; and the first updating module is used for updating the parameters of the deep neural network model based on a parameter optimization algorithm under the condition that the loss function is determined not to be converged according to the loss value.

Continuing with the exemplary structure of model training device 555 provided by the embodiments of the present application as software modules, in some embodiments, as shown in fig. 2B, the software modules stored in recognition device 355 for the misrecognized words in memory 350 may include:

a third obtaining module 3551, configured to obtain each sample sentence in the labeled sample set and a coding feature of a word to be recognized in each sample sentence; each sample statement in the labeled sample set is provided with a label sequence used for labeling whether a word to be identified in the sample statement is a wrongly written word or not;

a third identifying module 3552, configured to, for each sample sentence, perform feature extraction on the word to be identified in the sample sentence by using the trained language model to obtain a context feature of the word to be identified in the sample sentence, and identify, by using the deep neural network model, the coding feature and the context feature of the word to be identified in the sample sentence to obtain an identifying result indicating whether the word to be identified in the sample sentence is incorrect;

a second determining module 3553, configured to determine a loss value by using a loss function based on the recognition result corresponding to each sample statement and the tag sequence of each sample statement;

a second updating module 3554 configured to update the parameters of the deep neural network model based on a parameter optimization algorithm if it is determined from the loss values that the loss function does not converge.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the method for recognizing the misdistinguished words or the method for training the model according to the embodiment of the present application.

Embodiments of the present application provide a computer-readable storage medium storing executable instructions, which when executed by a processor, will cause the processor to perform the recognition method or model training method for misdistinguished words provided by embodiments of the present application, for example, the method shown in fig. 3.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, by the embodiment of the application, when the wrongly-recognized words are recognized, the characteristics of strong generalization of the language model and high recognition accuracy of the deep neural network model can be utilized, so that the wrongly-recognized words in the sentence can be better recognized. And on the basis of the deep neural network model, the characteristics of the language model are deeply fused, end-to-end identification of wrongly-distinguished words is carried out, and a large number of model fusion rules do not need to be set and maintained manually, so that the labor cost can be greatly reduced.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

29页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:医学事件的编码方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!