Wrong-word recognition method and device

文档序号:1938121 发布日期:2021-12-07 浏览:11次 中文

阅读说明:本技术 一种错别词识别方法及装置 (Wrong-word recognition method and device ) 是由 李萌 张少华 李勇乐 李�昊 于 2021-09-06 设计创作,主要内容包括:本申请实施例提供了一种错别词识别方法及装置。可以先接收并在页面的第一显示区域中显示用户输入的目标文本。可以对目标文本进行错别词识别,并在页面的第二显示区域中显示所述目标文本对应的错别词对列表。其中,错别词对列表可以包括一个或多个错别词对,每个错别词对可以包括目标文本中的错别词和用于对错别词进行纠错的正确词,错别词可以包括实体词和/或非实体词,实体词和非实体词是基于不同方式从目标文本中识别得到的。这样,可以准确地从目标文本中找到存在错误的错别词,并通过第二显示区域显示给用户,使得用户能够看到目标文本中存在的错别词。如此,用户能够方便快捷地对目标文本中的错别词进行处理。(The embodiment of the application provides a method and a device for identifying wrongly-recognized words. Target text input by a user may be received and displayed in a first display area of a page. The method can be used for identifying the wrong entry of the target text and displaying a wrong entry pair list corresponding to the target text in a second display area of the page. The list of the mischief pairs may include one or more mischief pairs, each mischief pair may include a mischief in the target text and a correct word for correcting the mischief, each mischief may include an entity word and/or a non-entity word, and the entity word and the non-entity word are identified from the target text based on different manners. Therefore, the wrongly-distinguished words with errors can be accurately found from the target text and displayed to the user through the second display area, so that the user can see the wrongly-distinguished words in the target text. Therefore, the user can conveniently and quickly process the wrongly-distinguished words in the target text.)

1. A method for identifying a mispronounced word, the method comprising:

receiving and displaying a target text input by a user in a first display area of a page;

displaying a wrong-entry word pair list corresponding to the target text in a second display area of the page; the wrongly-distinguished word pair list comprises one or more wrongly-distinguished word pairs, the wrongly-distinguished word pairs comprise wrongly-distinguished words in the target text and correct words for correcting the wrongly-distinguished words, the wrongly-distinguished words comprise entity words and/or non-entity words, and the entity words and the non-entity words are identified from the target text based on different modes.

2. The method of claim 1, further comprising:

responding to a misclassified word modification instruction triggered by the user, and determining a target misclassified word corresponding to the misclassified word modification instruction;

and modifying the target wrongly-written words in the target text into corresponding target correct words.

3. The method of claim 2, wherein prior to modifying the mispronounced word in the target text to the corresponding correct word, the method further comprises:

responding to a wrong-entry word modification instruction triggered by the user in the second display area, jumping to the display position of the target wrong entry word in the target text, and highlighting the target wrong entry word; or the like, or, alternatively,

and responding to a jump instruction triggered by clicking the wrongly-distinguished words by the user, jumping to the display position of the target wrongly-distinguished words in the target text, and highlighting the target wrongly-distinguished words.

4. The method of claim 2, further comprising:

and in response to the user-triggered one-key modification instruction of the mispronounced words, modifying the one or more mispronounced words included in the list of the mispronounced word pairs into corresponding correct words.

5. The method of claim 1, wherein the target text comprises N paragraphs, and correspondingly, the second display region comprises N display sub-regions, where N is an integer greater than 1;

the displaying the list of the mispredicted word pairs of the target text in a second display area of the page includes:

and displaying the list of the wrongly-distinguished word pairs of the paragraph corresponding to each display sub-area of the second display area.

6. The method of claim 1, wherein the second display area comprises a first sub-display area and a second sub-display area, the first sub-display area is used for displaying the mispredicted word pair comprising the real word, and the second sub-display area is used for displaying the mispredicted word pair comprising the non-real word.

7. The method of claim 1, wherein the mispronounced word comprises a first mispronounced word, the method further comprising:

highlighting the first discriminant word in the first display area;

responding to a display instruction triggered by the user to the first wrongly-written word, and displaying a first correct word corresponding to the first wrongly-written word;

replacing the first wrong word in the target text with the first correct word in response to a modification operation triggered by the user;

deleting the first wrongly-distinguished word and the first correct word in the list of wrongly-distinguished word pairs.

8. The method of claim 1, further comprising:

and displaying the frequency of the occurrence of the misregistration words in the target text in the second display area.

9. A method for identifying a mispronounced word, the method comprising:

acquiring a target text;

inputting the target text into a wrong word error correction model to obtain an error correction text, wherein the error correction text is a correct text after wrong words in the target text are corrected, the wrong word error correction model is obtained by training according to a wrong sentence pair, the wrong sentence pair comprises a wrong sentence and a correct sentence, the wrong sentence is a sentence comprising the wrong words, and the correct sentence is a sentence not comprising the wrong words;

comparing the target text with the error correction text to obtain a first wrongly-distinguished word pair list, wherein the first wrongly-distinguished word pair list comprises a first wrongly-distinguished word in the target text and a first correct word corresponding to the first wrongly-distinguished word in the error correction text, and the first wrongly-distinguished word is a non-entity word;

identifying a plurality of entity words from the target text, and determining a second misclassified word pair list according to the similarity of any two entity words in the plurality of entity words, wherein the second misclassified word list comprises second misclassified words and second correct words, and the second misclassified words and the second correct words are entity words;

and obtaining a wrong-difference word pair list of the target text according to the first wrong-difference word pair list and the second wrong-difference word pair list.

10. The method of claim 9, wherein said comparing said target text with said corrected text to obtain a first list of pairs of misread words comprises:

comparing the target text with the error correction text to obtain a first wrongly-distinguished word pair, wherein the first wrongly-distinguished word pair comprises a first wrongly-distinguished word and a first correct word;

and in response to the first wrongly-distinguished word being a non-entity word, adding the first wrongly-distinguished word pair to the first wrongly-distinguished word pair list.

11. The method of claim 9, further comprising:

processing the error sentence and the correct sentence into corresponding vectors respectively;

and respectively inputting the vector corresponding to the error sentence and the vector corresponding to the correct sentence into the wrong word error correction model for training, wherein the wrong word error correction model is a bidirectional coding representation BERT model based on a converter or a software surface Softmask-BERT model based on the BERT.

12. The method of claim 9, wherein the target text comprises a first entity word and a second entity word; determining a second misclassified word pair list according to the similarity of any two entity words in the plurality of entity words comprises:

determining similarity of the first entity word and the second entity word;

and in response to that the similarity is greater than or equal to a second threshold and the first time is greater than a second time, determining the first entity word and the second entity word as a second wrongly-distinguished word pair, and adding the second wrongly-distinguished word pair into the second wrongly-distinguished word pair list, wherein the first time is the number of times that the first entity word appears in the target text, the second time is the number of times that the second entity word appears in the target text, the first entity word is a correct word of the second wrongly-distinguished word pair, and the second entity word is a wrongly-distinguished word of the second wrongly-distinguished word pair.

13. The method of claim 12, wherein the similarity is expressed as a number of letters in the same character string included in the pinyin string for the first entity word and the pinyin string for the second entity word.

14. The method of claim 12, wherein the similarity is expressed as an edit distance between the pinyin string for the first physical word and the pinyin string for the second physical word.

15. An apparatus for identifying a misregistered word, the apparatus comprising:

the acquisition module is used for receiving and displaying a target text input by a user in a first display area of a page;

the display module is used for displaying a wrong entry pair list corresponding to the target text in a second display area of the page; the wrongly-distinguished word pair list comprises one or more wrongly-distinguished word pairs, the wrongly-distinguished word pairs comprise wrongly-distinguished words in the target text and correct words for correcting the wrongly-distinguished words, the wrongly-distinguished words comprise entity words and/or non-entity words, and the entity words and the non-entity words are identified from the target text based on different modes.

16. An apparatus for identifying a misregistered word, the apparatus comprising:

the acquisition module is used for acquiring a target text;

the error correction module is used for inputting the target text into an error correction model to obtain an error correction text, wherein the error correction text is a correct text after the error words in the target text are corrected, the error correction model is obtained by training according to error sentence pairs, the error sentence pairs comprise error sentences and correct sentences, the error sentences are sentences comprising the error words, and the correct sentences are sentences not comprising the error words;

the first comparison module is used for comparing the target text with the error correction text to obtain a first wrongly-distinguished word pair list, wherein the first wrongly-distinguished word pair list comprises a first wrongly-distinguished word in the target text and a first correct word corresponding to the first wrongly-distinguished word in the error correction text, and the first wrongly-distinguished word is a non-entity word;

the second comparison module is used for identifying a plurality of entity words from the target text, and determining a second misclassified word pair list according to the similarity of any two entity words in the plurality of entity words, wherein the second misclassified word list comprises second misclassified words and second correct words, and the second misclassified words and the second correct words are entity words;

and the determining module is used for obtaining a wrong-entry word pair list of the target text according to the first wrong-entry word pair list and the second wrong-entry word pair list.

17. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement the method of misidentified word recognition of any of claims 1-14.

18. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method for identifying mistyped words according to any one of claims 1 to 14.

Technical Field

The present application relates to the field of computers, and in particular, to a method and an apparatus for identifying a wrongly-distinguished word.

Background

With the development of computer technology, more and more people choose to process text using a computer instead of handwriting. Therefore, because the computer has the characteristics of convenience in modification and the like, the computer can be used for efficiently writing, editing, modifying and other operations on the text, and the office efficiency is greatly improved. However, the text in the computer is manually input by the user, and the user inevitably has carelessness when inputting or editing the text, so that the text has wrong words. Therefore, a method for recognizing the misnomer in the text is needed.

Disclosure of Invention

In order to solve the prior art, the embodiment of the application provides a method and a device for identifying a wrongly distinguished word.

In a first aspect, an embodiment of the present application provides a method for identifying a misregistered word, where the method includes:

receiving and displaying a target text input by a user in a first display area of a page;

displaying a wrong-entry word pair list corresponding to the target text in a second display area of the page; the wrongly-distinguished word pair list comprises one or more wrongly-distinguished word pairs, the wrongly-distinguished word pairs comprise wrongly-distinguished words in the target text and correct words for correcting the wrongly-distinguished words, the wrongly-distinguished words comprise entity words and/or non-entity words, and the entity words and the non-entity words are identified from the target text based on different modes.

In a second aspect, an embodiment of the present application provides a method for identifying a misregistered word, where the method includes:

acquiring a target text;

inputting the target text into a wrong word error correction model to obtain an error correction text, wherein the error correction text is a correct text after wrong words in the target text are corrected, the wrong word error correction model is obtained by training according to a wrong sentence pair, the wrong sentence pair comprises a wrong sentence and a correct sentence, the wrong sentence is a sentence comprising the wrong words, and the correct sentence is a sentence not comprising the wrong words;

comparing the target text with the error correction text to obtain a first wrongly-distinguished word pair list, wherein the first wrongly-distinguished word pair list comprises a first wrongly-distinguished word in the target text and a first correct word corresponding to the first wrongly-distinguished word in the error correction text, and the first wrongly-distinguished word is a non-entity word;

identifying a plurality of entity words from the target text, and determining a second misclassified word pair list according to the similarity of any two entity words in the plurality of entity words, wherein the second misclassified word list comprises second misclassified words and second correct words, and the second misclassified words and the second correct words are entity words;

and obtaining a wrong-difference word pair list of the target text according to the first wrong-difference word pair list and the second wrong-difference word pair list.

In a third aspect, an embodiment of the present application provides an apparatus for identifying a mispronounced word, where the apparatus includes:

the acquisition module is used for receiving and displaying a target text input by a user in a first display area of a page;

the display module is used for displaying a wrong entry pair list corresponding to the target text in a second display area of the page; the wrongly-distinguished word pair list comprises one or more wrongly-distinguished word pairs, the wrongly-distinguished word pairs comprise wrongly-distinguished words in the target text and correct words for correcting the wrongly-distinguished words, the wrongly-distinguished words comprise entity words and/or non-entity words, and the entity words and the non-entity words are identified from the target text based on different modes.

In a fourth aspect, an embodiment of the present application provides an apparatus for identifying a mispronounced word, where the apparatus includes:

the acquisition module is used for acquiring a target text;

the error correction module is used for inputting the target text into an error correction model to obtain an error correction text, wherein the error correction text is a correct text after the error words in the target text are corrected, the error correction model is obtained by training according to error sentence pairs, the error sentence pairs comprise error sentences and correct sentences, the error sentences are sentences comprising the error words, and the correct sentences are sentences not comprising the error words;

the first comparison module is used for comparing the target text with the error correction text to obtain a first wrongly-distinguished word pair list, wherein the first wrongly-distinguished word pair list comprises a first wrongly-distinguished word in the target text and a first correct word corresponding to the first wrongly-distinguished word in the error correction text, and the first wrongly-distinguished word is a non-entity word;

the second comparison module is used for identifying a plurality of entity words from the target text, and determining a second misclassified word pair list according to the similarity of any two entity words in the plurality of entity words, wherein the second misclassified word list comprises second misclassified words and second correct words, and the second misclassified words and the second correct words are entity words;

and the determining module is used for obtaining a wrong-entry word pair list of the target text according to the first wrong-entry word pair list and the second wrong-entry word pair list.

In a fifth aspect, an embodiment of the present application provides an electronic device, where the electronic device includes:

one or more processors; a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method of wrongly-written-word recognition as in any of the embodiments of the present application.

In a sixth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for identifying a wrongly-written word according to any of the embodiments of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic flowchart of a method for identifying wrongly written characters according to an embodiment of the present application;

fig. 2 is a schematic diagram of a display interface of a client according to an embodiment of the present disclosure;

FIG. 3-A is another schematic diagram of a display interface of a client according to an embodiment of the present disclosure;

3-B is a further schematic diagram of a display interface of a client according to an embodiment of the present application;

fig. 4 is another schematic diagram of a display interface of a client according to an embodiment of the present disclosure;

fig. 5 is another schematic diagram of a display interface of a client according to an embodiment of the present application

Fig. 6 is another schematic diagram of a display interface of a client according to an embodiment of the present application

Fig. 7 is a flowchart illustrating a method for identifying wrongly written characters according to an embodiment of the present application

Fig. 8 is a schematic structural diagram of an apparatus for identifying wrongly written characters according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an apparatus for identifying wrongly written characters according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present application. It should be understood that the drawings and embodiments of the present application are for illustration purposes only and are not intended to limit the scope of the present application.

It should be understood that the various steps recited in the method embodiments of the present application may be performed in a different order and/or in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present application is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present application are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this application are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that reference to "one or more" unless the context clearly dictates otherwise.

Because the electronic text on the computer has the characteristics of convenience for modification, convenience for format adjustment, convenience for transmission and the like, the current word workers can utilize the computer to write, edit, modify and the like the text in order to improve the efficiency. However, most electronic texts are manually input into a computer by a user through an input method, and errors and omissions may occur in the input process, so that wrong words exist in the electronic texts.

When the number of words of the electronic text is large, it may be difficult for the user to accurately find the wrong words by the human eye alone, causing problems with the electronic text. For example, for a text with a large amount of characters, such as a novel, if the user personally checks the wrong words, a great deal of effort is consumed by the user, and the checking effect may be poor. Therefore, a method capable of automatically identifying the misread word is needed.

In order to solve the problems in the prior art, embodiments of the present application provide a method for identifying a wrong distinguished word, which is described in detail below with reference to the accompanying drawings of the specification.

Fig. 1 is a schematic flow chart of a method for identifying a wrong-entry word according to an embodiment of the present disclosure, where the embodiment is applicable to a scenario in which a target text is identified by a wrong-entry word identifying device, and the method can be executed by the wrong-entry word identifying device, and the wrong-entry word identifying device has data processing capability, can be implemented by software and/or hardware, and is integrated in a client of a user. The client may be integrated in a Personal Computer (PC) terminal or a mobile terminal. Of course, the method for identifying the mispronounced words provided in the embodiment of the present application may also be executed by a computer device such as a server. This method is described below as an example of being performed by a client. As shown in fig. 1, the method specifically includes the following steps:

s101: target text input by a user is received and displayed in a first display area of a page.

When the user wants to perform misregistration word recognition on the target text, the target text may be input. For example, an input box may be displayed on a display device of the client so that the user enters target text in the input box. Optionally, the target text may also be uploaded by the user. Specifically, an upload control may be displayed on a display device of the client, and the user may upload the target file by triggering the upload control. The target file comprises a target text which is required to be subjected to misreading word recognition by the user.

After receiving the target text input by the user, the target text input by the user may be displayed in a first display area of the page. Optionally, the display area where the input box for inputting the target text is located may be the first display area. That is, an input box may be displayed in the first display area of the page, and target text input by the user is received and then displayed in the input box.

Specifically, as shown in fig. 2, the display area of the display device may include a title input area 210 and a text input area 220. The text input area 220 corresponds to the input frame in the first display area. The user may enter the title of an article in title input area 210 and the text of the article in text input area 220. Then, in acquiring the target text input by the user, the text input by the user in the text input area 220 may be used as the target text. Optionally, in some possible implementations, the display area may not include the title input area 210, or include other input areas for inputting text.

S102: and displaying a wrong word pair list corresponding to the target text in a second display area of the page.

If after the target text is entered, a list of misregistration word pairs corresponding to the target text may be displayed in a second display region of the page. The second display area is a display area different from the first display area on the page, the list of the mischief pairs comprises one or more mischief pairs, and each mischief pair comprises a mischief and a correct word corresponding to the mischief. The wrong words come from the target text, and the correct words are used for correcting the wrong words. The mispronounced words comprise entity words and/or non-entity words, and the entity words and the non-entity words can be identified from the target text based on different modes. For the description of the recognition method of the entity word, the non-entity word and the misregistration word, reference may be made to the following text, which is not described herein again.

Alternatively, the target text may be a Chinese text. When the target text is a Chinese text, the mispronounced words may include one or more single words. When the wrongly-distinguished words comprise a single word, the wrongly-distinguished words can be called wrongly-distinguished words; when the wrongly-written words include a plurality of single words, the plurality of single words may include one or more wrongly-written words. That is, when the wrongly-written words are multi-word words, the wrongly-written words may include wrongly-written words and correct words.

In the embodiment of the present application, the list of the mispronounced word pairs may be displayed based on the mispronounced word display instruction. That is, if it is detected that the user has triggered the indication of displaying the misregistered word, the target text may be subjected to the recognition of the misregistered word, the misregistered word included in the target text and the correct word corresponding to the misregistered word are determined, and the list is displayed by the misregistered word.

In the embodiment of the application, the mispronounced word display instruction can be triggered by user operation or can be automatically triggered. The following are described separately.

In a first possible implementation, the mispronounced word display instruction may be user-action triggered. The user operation may include a click operation on the wrong-entry display control. Specifically, the misregistration word display control may be displayed on a display area of the client. When the user wants to perform the wrong-word recognition on the target text, the user can click the wrong-word display control displayed in the display area. If the wrong-entry display control is clicked, the user can be considered to trigger a wrong-entry display instruction, and therefore follow-up operation is continuously executed.

The description is made by taking fig. 2 as an example. As shown in fig. 2, the display area of the display device may further include an misregistration word display control 230. The mispredicted word pair display control 230 may be located on the side of the text input area 220 or may be located elsewhere in the display area. If the target text is desired to be recognized as a mispronounced word, the user can click the mispronounced word pair display control 230 on the display area to perform the mispronounced word recognition on the target text. Representing words having actual, complete, meaning

In the embodiment of the application, besides the click operation on the display control of the mispronounced word, the user operation may also be a gesture operation, or an operation of sending a voice instruction to the client.

In a second possible implementation, the mispronounced word display instruction is automatically triggered.

Optionally, a timer may be set in the client. In this way, after the user inputs the target text, the display of the wrong words can be triggered by the timer, so that the wrong words of the target text can be identified periodically. Alternatively, the time when the user enters the target text may be recorded by a timer. If the target text is detected not to be input by the user within a period of time, the wrong-word display instruction can be actively triggered to identify the wrong words in the target text.

After the mispronounced word recognition instruction is triggered, the mispronounced word recognition can be carried out on the target text, one or more mispronounced word pairs are obtained from the target text, and the recognized one or more mispronounced word pairs are displayed through a mispronounced word pair list. Wherein, one wrongly-distinguished word pair comprises one wrongly-distinguished word and a correct word corresponding to the wrongly-distinguished word. For a specific method for identifying the misregistered word, reference may be made to the description of the subsequent embodiments, which are not described herein again.

The following describes a method of displaying a list of wrongly-typed word pairs.

In the embodiment of the present application, the list of the wrongly-recognized word pairs may be displayed in a display area other than the target text, for example, may be displayed in parallel with the target text. Specifically, as shown in fig. 3-a, the display area may include a wrong-word pair list display area 310 and a target text display area 320 on the basis of the display area shown in fig. 2. The list of wrongly-recognized word pairs display area 310 is used for the list of wrongly-recognized word pairs, and the target text display area 320 is used for displaying the target text. In the embodiment shown in fig. 3-a, a first mispredicted word pair 311, a second mispredicted word pair 312, a third mispredicted word pair 313, a fourth mispredicted word pair 314, and a fifth mispredicted word pair 315 are displayed in the mispredicted word pair display area 310, and are respectively used for displaying five mispredicted words and correct words respectively corresponding to the five mispredicted words.

Alternatively, the target text displayed in the target text display area 320 may be the marked target text. Marking means that the mispronounced words in the target text are distinguished from other correct words, for example, red wavy lines can be added below the mispronounced words in the target text. Optionally, when the user moves a control such as a cursor to the mispronounced word, the correct word corresponding to the mispronounced word may be displayed through a floating window or a popup window.

In some possible implementations, when the number of the wrong word pairs in the target text is large, the wrong word pair list display area 310 may not display all the wrong word pairs at once. Then, the misclassified-word pair list display area 310 may display a portion of the misclassified words at a time and display a page turning control (not shown in fig. 3-a) for the user to view other misclassified words through the page turning control.

In the embodiment of the application, when the list of the wrongly-distinguished word pairs is displayed, data of the wrongly-distinguished word pairs can be displayed on the display area, so that a user can know how many wrongly-distinguished words exist in the target file. Specifically, as shown in fig. 3-a, the display area may further include a miscord pair number display area 320. The number of mispredicted word pairs display area 330 is used to display the number of mispredicted word pairs in the target text. In the embodiment shown in fig. 3-a, the mischief pair number display area 330 displays a value of 5 indicating that there are 5 mischiefs in the target text. Alternatively, the misregistration word pair number display area 330 may also belong to the misregistration word pair list display area 310.

In some possible implementations, the number of the wrong-entry word pairs may also be displayed by the wrong-entry word pair display control, for example, may be displayed around the wrong-entry word pair display control by way of superscript or subscript. In particular, as shown in fig. 3-a, the display area may further include a misclass pair display control 340. In the upper right corner of the mischief pair display control 340, the number of mischief pairs of the target text is displayed, indicating that 5 mischiefs in the target text are present.

The display mode of the wrongly-distinguished word pairs in the list of wrongly-distinguished word pairs is described below.

In the embodiment of the present application, the mispronounced word pair may include a mispronounced word and a correct word corresponding to the mispronounced word. In order to facilitate the user to distinguish the wrong words from the correct words from the wrong word pairs, when the wrong word pairs are displayed, marks can be added to the wrong words in the wrong word pairs. Optionally, a modification mark may be added between the mispronounced word and the correct word, where the modification mark is used to identify the mispronounced word and the correct word, and the correct word and the mispronounced word may be displayed in different colors, for example, the mispronounced word may be displayed in red and the correct word may be displayed in black.

Specifically, as shown in fig. 3-B, the first mispredicted word pair 311 in fig. 3-a may include a mispredicted word display region 311-1, a correct word display region 311-2, and an error flag 311-3. In the embodiment shown in fig. 3-B, the misread word display area 311-1 is used for displaying the misread word "clothing copy", the correct word display area 311-2 is used for displaying the correct word "clothing" corresponding to the misread word "clothing copy", and the error flag 311-3 is used for indicating that "clothing copy" is the misread word and "clothing" is the correct word.

In some possible implementations, the target text may include multiple paragraphs. In order to facilitate the user to quickly find the paragraph where the wrongly-distinguished word is located, the wrongly-distinguished word may be distinguished and displayed for one or more wrongly-distinguished words included in the list according to the paragraph where the wrongly-distinguished word is located.

Specifically, N display sub-regions (N is an integer greater than or equal to 1) may be divided in the second display region, and each display sub-region may correspond to one paragraph in the target text. When each of the wrongly-distinguished word pairs in the list of wrongly-distinguished word pairs is displayed, the wrongly-distinguished word pair may be displayed in a display sub-area corresponding to the paragraph in which the wrongly-distinguished word in the wrongly-distinguished word pair is located. That is, the display sub-region may correspond to a paragraph in the target text, and the corresponding misrecognized word pair of the misrecognized word in the paragraph may be displayed in the display sub-region.

As can be seen from the foregoing description, in the embodiments of the present application, the mispronounced words may be divided into entity words and non-entity words, and the entity words and the non-entity words are detected from the target text in different ways. Accordingly, the second display region may be divided into a first sub display region and a second sub display region. The first sub-display area can be used for displaying the wrongly-distinguished word pairs with the wrongly-distinguished words as the entity words, and the second sub-display area can be used for displaying the wrongly-distinguished word pairs with the wrongly-distinguished words as the non-entity times.

In some possible implementations, the user may click on the misclassified word pairs displayed in the misclassified word pair display list. When the wrong word pair is detected to be clicked by the user, the position of the wrong word in the target text can be jumped to, so that the user can conveniently view the wrong word pair.

After jumping to the position of the wrong word in the target text, the user may be prompted about the specific position of the wrong word in the target text, for example, the user may be prompted in a highlight and flash manner. For a description of this part, reference may be made to the following description, which is not repeated here.

In the embodiment of the application, the user can also trigger a wrong-entry word modification instruction to modify the wrong entry word displayed in the target text into a corresponding correct word. Optionally, the mispronounced word modification instruction may be used to modify a single mispronounced word into a corresponding correct word, or may be used to modify multiple mispronounced words in the target text into correct words. This possible implementation is described separately below

In a first possible implementation, the mispronounced word modification instruction may be used to modify the mispronounced word in a single pair of mispronounced words to a corresponding correct word. That is, the mischief pair instruction is used to modify a mischief pair.

In this embodiment of the present application, the misclassified-word modification instruction may be triggered by a user operating a misclassified-word modification control. For example, an incorrect word modification control may be displayed in the incorrect word pair list. When it is detected that the user clicks the misclassified word modification control, it can be determined that the misclassified word modification instruction is triggered, so that the misclassified word is modified. Optionally, a mischief modification control may be displayed in the second display area.

Optionally, since the mischief modification instruction is used to modify a single mischief, the mischief modification control may be displayed in association with a mischief pair in the list of mischief pairs. Specifically. As shown in fig. 4, an incorrect word pair list display area 410 and a target text display area 420 may be included in the display area. The distinguished word list display area 410 includes a first distinguished word pair display area 411 and a first distinguished word modification control 412.

The first wrongly-written phrase pair display area 411 is used for displaying the wrongly-written phrase "clothing" in the target text and the correct phrase "clothing" corresponding to the wrongly-written phrase. The first mispronounced word modification control 412 is used to trigger a mispronounced word modification instruction. After the user clicks the first mispronounced word modification control 412, a mispronounced word modification instruction may be triggered to modify "clothing" in the target text to "clothing".

As can be seen from the foregoing description, after determining the erroneous words included in the target text, the erroneous words may be marked in the target text displayed in the first display area. When the user moves the controls such as the cursor to the wrong entry, the correct word corresponding to the wrong entry can be displayed through the floating window or the popup window. In this application scenario, the user may trigger an incorrect word modification instruction for the incorrect word or the correct word displayed in the first display area, so as to modify the incorrect word into the correct word. Optionally, the instruction for modifying the mispronounced words triggered by the incorrect word or the correct word displayed in the first display area may be used to modify the mispronounced words, or may be used to modify a plurality of similar mispronounced words in the target text.

In a second possible implementation manner, the mispronounced word modification instruction may be used to modify a plurality of mispronounced words in a plurality of pairs of mispronounced words into corresponding correct words, respectively. That is, the mischief pair instruction is used to modify a plurality of mischief pairs. The plurality of mispronounced words may include all the mispronounced words in the mispronounced word pair list, that is, all the mispronounced words of the target text. When the mispronounced word modification instruction is used for modifying all the mispronounced words in the multi-target text, the mispronounced word modification instruction can be also called a one-key modification instruction.

Similar to the first possible implementation manner, the mispronounced word modification instruction for modifying the multiple mispronounced words into corresponding correct words may also be triggered by the operation of the mispronounced word modification control by the user. Alternatively, the mistransition word modification control for triggering the opinion modification instruction may be referred to as an opinion modification control.

Specifically. As shown in fig. 5, a list of misregistration word pairs display area 510 and a target text display area 520 may be included in the display area. The distinguished word list display area 510 may include a first distinguished word pair display area 511, a first distinguished word modification control 512, a second distinguished word pair display area 513, a second distinguished word modification control 514, and a one-key modification control 515.

The first wrongly-written phrase pair display area 511 is used to display the wrongly-written phrase "clothing" in the target text and the correct phrase "clothing" corresponding to the wrongly-written phrase. The first mispronounced word modification control 512 is used for triggering a mispronounced word modification instruction to modify the 'clothing' in the target text into 'clothing'. The second misregistration word pair display area 513 is used to display the misregistration word "tiger" and the correct word "user" corresponding to the misregistration word in the target text. The second mispronounced word modification control 514 is used for triggering a mispronounced word modification instruction to modify the 'tiger' in the target text into 'user'. The one-touch modification control 515 is used to trigger a one-touch modification instruction to modify all the mispronounced words present in the target text. In the embodiment shown in fig. 5, after the one-touch modification control key 515 is triggered, the "dressing" in the target text may be modified to "clothing" and "tiger" may be modified to "user".

It should be noted that, as shown in fig. 5, the mischief pair display list (i.e., the second display area) may include a mischief modification control for modifying a single mischief, or may include a mischief modification control for modifying a plurality of mischiefs.

In some possible implementations, after the user triggers the mispronounced word modification instruction, the display position of the mispronounced word in the target text can be jumped to. For example, in the embodiment shown in fig. 4, if the user triggers an incorrect word modification instruction for the incorrect word "copy" through the first incorrect word modification control 412, the target text displayed in the target text display area 420 may jump to a position corresponding to the incorrect word "copy". For an incorrect word modification instruction for modifying a plurality of incorrect words, a jump may be made to a display position of a target incorrect word in the target text. And the target wrongly-written words are the wrongly-written words with the most rear positions in the target text in the plurality of modified wrongly-written words.

In some possible implementation manners, the user may also control to jump to a position corresponding to the mispronounced word in the target text by clicking the mispronounced word pair in the second display area. That is, the user may trigger a jump instruction by clicking on the mispronounced word. After the jump instruction is received, the display position of the target error word in the target text can be jumped to, and the target error word is highlighted.

Optionally, the display position of the mispronounced word in the target text may be jumped to before the mispronounced word is modified, or the display position of the mispronounced word in the target text may be jumped to after the mispronounced word is modified.

In some possible implementations, after jumping to the display position corresponding to the mispronounced word, the user is also prompted that the mispronounced word is to be modified or is about to be modified.

Specifically, after jumping to the display position of the mispronounced word in the target text, before modifying the mispronounced word, the mispronounced word displayed in the target text may flash brightly to prompt the user of the specific position of the mispronounced word in the target text. The highlighting refers to setting the background color of the wrongly-recognized word to be different from the background color of other words in the target text, for example, the background color of the wrongly-recognized word may be set to be yellow. When highlighting, the wrongly-distinguished words may be highlighted first, then the beijing color of the wrongly-distinguished words is set as the background color of the other words in the target text, and then the wrongly-distinguished words are highlighted. Alternatively, the mistyped word may be prompted by a pop-up window or other form to be modified.

In some possible implementations, the misregistered words identified by the misregistered word identification method may be inaccurate, resulting in a portion of the correct words being identified as misregistered words and displayed in a list of misregistered word pairs. For these correct words that are misidentified as mispronounced words, the user may retain them by triggering a mispronounced word ignore instruction. After the misclassified-word ignoring instruction is triggered, the misclassified word corresponding to the misclassified-word ignoring instruction can be marked as a correct word, and a misclassified word pair corresponding to the misclassified word can be deleted from the misclassified-word pair list.

The wrong-entry ignoring instruction can be triggered by the operation of the wrong-entry ignoring control by the user. Optionally, the mischief ignoring instruction may be used to ignore a single mischief, or may be used to ignore multiple mischiefs, for example, all mischiefs in the target text may be ignored.

Specifically, as shown in fig. 4, the mischief pair list display area 410 may further include a mischief ignore control 413. After the user clicks the misclassified word ignore control 413, it may be determined that a misclassified word ignore instruction is triggered, and thus the misclassified word "copy" is ignored. The wrong-entry word pair "shui fu → shui fu" is deleted from the wrong-entry word pair list display area 410.

In some possible implementations, the target text entered by the user may be the target text to be published. For example, the target text input by the user may be a blog or a novel to be published. In order to reduce the mispronounced words in the target text published by the user, before the target text is published, whether the target text has the mispronounced words or not can be judged. And alerting the user when the target text includes a mispronounced word. In this application scenario, the method for identifying the misregistration word provided by the embodiment of the present application may be executed by a web page or an application program for publishing the target text.

Specifically, after receiving a user-triggered issuing instruction for the target text, it may be detected whether the target text contains a mispronounced word, that is, whether the number of the mispronounced word pairs in the mispronounced word pair list of the target text is 0. If the number of the wrongly-distinguished word pairs in the target text wrongly-distinguished word pair list is not 0, the reminding information can be displayed. The reminder information may be, for example, a pop-up window containing the reminder utterance.

Specifically, as shown in fig. 6, the display areas may include a target text display area 610, an incorrect word pair list display area 620, a reminder information display area 630, and a post control 640. In the embodiment shown in fig. 6, the target text includes 2 misregistered words, and the list of misregistered word pairs includes 2 pairs of misregistered words. If it is detected that the user triggers the publishing control, it may be determined that the user triggered a publishing instruction for the target text. Since the target text includes 2 wrongly-distinguished words, the user may be prompted by the prompt information display area 630 that the number of wrongly-distinguished word pairs in the list of wrongly-distinguished word pairs is not 0.

The reminder information display area 630 may include reminder information for prompting the user that the mispronounced words still exist in the target text. In the embodiment shown in fig. 6, the reminding message is "there are also misrecognized words, ask whether to issue". A confirmation control 631 in the reminder information display area 630 may be triggered if the user still wants to post the target text, and a cancel control 633 in the reminder information display area 630 may be triggered if the user wants to make adjustments to the target text.

According to the method for identifying the wrongly-distinguished words, the target text can be obtained firstly, the wrongly-distinguished words in the target text are displayed through the wrongly-distinguished word pair list, and the user can check the wrongly-distinguished words conveniently. In addition, the user can jump to the position of the wrong entry in the target text by triggering the control, or modify or ignore the wrong entry in the target text. Therefore, the user can conveniently and quickly process the wrongly-recognized words in the target text.

The above description describes a method for displaying a list of wrongly-distinguished word pairs in the method for identifying wrongly-distinguished words provided by the embodiment of the present application, and other related operations. A method of recognizing a misregistered word from a target text is described below.

Referring to fig. 7, which is a schematic flow chart of a method for identifying a misregistered word according to an embodiment of the present application, including:

s701: and acquiring a target text.

For a description of obtaining the target text, reference may be made to the foregoing description, and details are not repeated here.

S702: and inputting the target text into the wrong word error correction model to obtain an error correction text.

After the target text is obtained, the target text can be input into the error correction model to obtain an error correction text. The error correction text is a correct text obtained after the erroneous words included in the target text are corrected into correct words. The wrong-word error correction model can be obtained by training according to a wrong sentence pair, and the wrong sentence pair can comprise a wrong sentence and a correct sentence. The error sentence is a sentence including the wrong distinguished words, and the correct sentence is a sentence corresponding to the error sentence and not including the wrong distinguished words.

In some possible implementations, the erroneous-word error correction model may be a Bidirectional encoding representation from transforms (BERT) model based on a transformer, or may be a software-surface-based (Softmask-BERT) model based on BERT.

In some possible implementations, pairs of error sentences used for training the error correction model may be replaced. Specifically, a plurality of correct sentences may be obtained first, and then one or more characters in the correct sentences are replaced with the wrongly-distinguished words, so as to obtain a wrong sentence corresponding to the correct sentence. In some other possible implementations, pairs of incorrect sentences may be collected from multiple text of the discourse. For example, a multi-space target text can be obtained, the error sentence in the text can be manually marked, and the misclassified words in the error sentence can be corrected into the correct words to obtain the correct sentence. If the wrong word error correction model is a Softmask-BERT model or BERT model, the wrong sentence and the correct office can be processed into corresponding vectors respectively before the correct sentence and the wrong sentence for training are input into the wrong word error correction model.

S703: and comparing the target text with the error correction text to obtain a first misclassified word pair list.

After the error correction text is obtained, the target text and the error correction text may be compared to obtain a first misclassified word pair list. The first misclassified word pair list comprises one or more misclassified words and correct words corresponding to each misclassified word.

In the embodiment of the present application, the mistleted words included in the first mistleted word pair list are non-entity words. The entity word refers to a noun or pronoun having a specific meaning in the target text, and may include, for example, a name of a person or a place, a noun commonly used in some fields, and a noun created by an author of the target text and continuously used in the target text. Conversely, non-entity words are words other than entity words.

Specifically, in the process of comparing the target text with the error correction text, if the misclassified word in the obtained misclassified word pair belongs to the entity word, the misclassified word pair can be rejected from being added into the first misclassified word pair list. For example, assume that a first wrongly-spoken word pair is obtained by comparing the target text and the corrected text, wherein the first wrongly-spoken word pair includes a first wrongly-spoken word and a second correct word. Then, whether the first mistyped word is an entity word can be judged. If the first misclassified word is a physical word, the first misclassified word pair may be rejected from adding to the first misclassified word pair list. If the first misclassified word is a non-entity word, the first misclassified word pair may be added to a first misclassified word pair list.

If a word includes a misnomer, the word may be a noun with an undivided lattice that has a practical meaning. To prevent the recognition of a character in these words as a misdistinguished word and to determine whether the recognized misdistinguished word belongs to a physical word.

For example, the word "clothing" is wrong with respect to "clothing," clothing "is a wrong word, and" clothing "is a correct word. However, if the target text is a novel, there is a person named "Zhangfu". Since the word "zhang" may indicate that "zhang fu" belongs to a part of the name of the person, the word "zhang fu" may be determined as a real word, thereby excluding the addition of the wrongly written word pair "zhang fu → clothes" to the first list of wrongly written word pairs.

In the embodiment of the application, whether the first wrongly-distinguished word is an entity word or not can be judged through a named entity recognition technology, and whether the first wrongly-distinguished word is an entity word or not can also be judged by querying other target texts or webpages.

Besides the entity words, in the embodiment of the present application, it may also be determined whether the misclassified-word pair can be added to the first misclassified-word pair list through other conditions. The first mistyped word pair is explained below as an example.

In a first possible implementation manner, it may be determined whether the number of times that the first mispronounced word appears in the target text is greater than a threshold. If the number of times the first misclassified word appears in the target text is greater than the first threshold, the first misclassified word pair may be rejected from being added to the first misclassified word pair list.

If a word includes an incorrect word but appears multiple times in the target text, the incorrect word may be intentionally added to the target text by the user. Therefore, in order to prevent the characters in the words from being recognized as the wrongly-distinguished words, the number of times that the first wrongly-distinguished word appears in the target text may be counted, and it may be determined whether the number of times is greater than the first threshold. If the number of times is greater than the first threshold, the first misclassified word pair may be rejected from adding to the first misclassified word pair list.

In a second possible implementation, it may be determined whether the first disclaimer word and the first correct word are synonyms or antonyms. If the first misclassified word is a synonym or antonym of the first correct word, the first misclassified word pair may be rejected from being added to the first misclassified word pair list.

In some possible implementations, the first mispronounced word may be a word that is syntactically problematic with the target text. For example, it may be an add word or a delete word in the target text. Then, whether the first discriminant word and the first correct word corresponding to the first discriminant word are synonyms or antonyms can be determined through a grammar error correction model.

If the first wrong word is a similar meaning word of the first correct word, the meaning of the target text is not greatly influenced by replacing the first wrong word with the first correct word, and the possibility that the user is in some consideration to purposely replace the first correct word with the first wrong word exists. Therefore, in order to reduce the probability of false alarm, the first wrongly-spoken word pair may be rejected from being added to the list of first wrongly-spoken words, thereby avoiding modifying the first wrongly-spoken word pair.

If the first wrongly-written word is an anti-word of the first correct word, it is stated that replacing the first wrongly-written word with the first correct word causes the semantics of the target text to be reversed, and there is a possibility that the meaning expressed by the target text is affected. Thus, to avoid erroneous modifications, the first mischief pair may be rejected from being added to the list of mischief pairs, thereby avoiding modifications to the first mischief pair.

In a second possible implementation, it may be determined whether the first correct word is a correct word that semantically emphasizes the first incorrect word. The first mischief pair may be rejected from adding to the first mischief pair list if the first correct word is a correct word that semantically emphasizes the first mischief word.

In some possible implementations, the first correct word corresponding to the first incorrect word may be for emphasizing the first incorrect word. If the first error word is modified into the first correct word, the expression strength of the target text is enhanced, but the meaning of the target text in the actual expression is not modified. Therefore, in order to reduce the probability of false positives, the first wrongly-spoken word pair may be deleted from the list of wrongly-spoken words, thereby avoiding modification of the first wrongly-spoken word pair.

S704: and identifying a plurality of entity words from the target text, and determining a second misclassified word pair list according to the similarity of any two entity words in the plurality of entity words.

In step S703, a non-entity word is selected from the mispronounced words included in the target text, and the mispronounced word pair including the non-entity order is added to the first list of mispronounced word pairs. In an actual application scenario, the entity words may be in error. For example, one entity word may be written to another entity word when writing the target text. For example, "wuhan" may be input as "turnip lake" when inputting a place name. Therefore, in order to find the wrong entity word in the target text, the second wrong word pair list may be determined from the target text according to the similarity of the entity words. The second wrongly-distinguished words comprise at least one second wrongly-distinguished word pair, and the second wrongly-distinguished word pair comprises second wrongly-distinguished words and second correct words. The second mispronounced word and the second correct word are both entity words.

Taking the target text including the first entity word and the second entity word as an example, a process of introducing and judging whether the first entity word or the second entity word is the second mispronounced word is performed. The first entity word and the second entity word are any two entity words in the target text. When the number of the entity words in the target text is more than two, any two entity words in a plurality of any entity words can be used as the first entity word and the second entity word for judgment.

In order to determine whether the mispronounced word exists in the first entity word and the second entity word, the first time and the second time may be counted first, and the similarity between the first entity word and the second entity word may be compared. The first frequency is the frequency of occurrence of the first entity word in the target text, the second frequency is the frequency of occurrence of the second entity word in the target text, and the similarity represents the degree of similarity between the first entity word and the second entity word, so that the probability that the user wrongly inputs the first entity word into the second entity word or inputs the second entity word into the first entity word can be embodied. The similarity may be, for example, the number of letters in the same character string included in the pinyin string of the first entity word and the pinyin string of the second entity word, or an edit distance between the pinyin string of the first entity word and the pinyin string of the second entity word. For example, if the first entity word is "wuhan" and the second entity word is "turnip lake", the pinyin string of the first entity word is "wuhan" and the pinyin string of the second entity word is "wuhu", the same character string included therein is "wuh", and the number of letters is 3. The edit distance between the pinyin string of the first physical word and the pinyin string of the second physical word is 2.

After the first frequency, the second frequency and the similarity between the first entity word and the second entity word are obtained, the magnitudes of the first frequency and the second frequency can be compared, and the magnitude between the similarity and the second threshold can be compared. If the similarity is larger than the second threshold, the similarity between the first entity word and the second entity word is high, and the possibility of user input error exists. And if the first time number is greater than the second time number, the occurrence frequency of the first entity word in the target text is more than the occurrence frequency of the second entity word. Then, it may be considered that the user wrongly inputs the first entity word into the second entity word when inputting the first entity word. Based on this, the first entity word and the second entity word may be determined as a second mispronounced word pair, where the first entity word is a second correct word and the second entity word is a second mispronounced word.

If the first times is smaller than the second times, the second entity words in the target text are shown to appear more times than the first entity words. Then, it may be considered that the user wrongly inputs the second entity word into the first entity word while inputting the first entity word. Based on this, the first entity word and the second entity word may be determined as a second mispronounced word pair, where the second entity word is a second correct word and the first entity word is a second mispronounced word.

S705: and obtaining a wrong word pair list of the target text according to the first wrong word pair list and the second wrong word pair list.

After the first wrongly-written word pair list and the second wrongly-written word pair list are obtained, a wrongly-written word pair list of the target text may be obtained according to the first wrongly-written word pair list and the second wrongly-written word pair list, and then the method described in the embodiment corresponding to fig. 1 is used for displaying.

Fig. 8 is a schematic structural diagram of an incorrect word recognition apparatus according to an embodiment of the present application, where the embodiment may be applied to a scene in which an incorrect word is recognized from a target text, and the incorrect word recognition apparatus specifically includes an obtaining module 810 and a display module 820.

Specifically, the obtaining module 810 is configured to receive and display a target text input by a user in a first display area of a page.

A display module 820, configured to display a list of the misregistration word pairs corresponding to the target text in a second display area of the page; the wrongly-distinguished word pair list comprises one or more wrongly-distinguished word pairs, the wrongly-distinguished word pairs comprise wrongly-distinguished words in the target text and correct words for correcting the wrongly-distinguished words, the wrongly-distinguished words comprise entity words and/or non-entity words, and the entity words and the non-entity words are identified from the target text based on different modes.

The mispronounced word recognition device provided by the embodiment of the application can execute the mispronounced word recognition method provided by any embodiment of the application, and has corresponding functional units and beneficial effects for executing the mispronounced word recognition method.

Fig. 9 is a schematic structural diagram of an apparatus for identifying a misregistered word according to an embodiment of the present application, where the present embodiment may be applied to a scene where a misregistered word is identified from a target text, and the apparatus for identifying a misregistered word specifically includes an obtaining module 910, an error correcting module 920, a first comparing module 930, a second comparing module 940, and a determining module 950.

Specifically, the obtaining module 910 is configured to obtain a target text.

The error correction module 920 is configured to input the target text into an incorrect word correction model to obtain an error correction text, where the error correction text is a correct text after the incorrect terms in the target text are corrected, the incorrect word correction model is obtained by training according to an incorrect sentence pair, the incorrect sentence pair includes an incorrect sentence and a correct sentence, the incorrect sentence is a sentence including the incorrect terms, and the correct sentence is a sentence that does not include the incorrect terms.

The first comparison module 930 is configured to compare the target text with the error-corrected text to obtain a first misclassified word pair list, where the first misclassified word pair list includes a first misclassified word in the target text and a first correct word corresponding to the first misclassified word in the error-corrected text, and the first misclassified word is a non-entity word.

A second comparing module 940, configured to identify a plurality of entity words from the target text, and determine a second misclassified word pair list according to a similarity between any two entity words in the plurality of entity words, where the second misclassified word list includes a second misclassified word and a second correct word, and the second misclassified word and the second correct word are both entity words.

A determining module 950, configured to obtain a list of wrongly-distinguished word pairs of the target text according to the first list of wrongly-distinguished word pairs and the second list of wrongly-distinguished word pairs.

The mispronounced word recognition device provided by the embodiment of the application can execute the mispronounced word recognition method provided by any embodiment of the application, and has corresponding functional units and beneficial effects for executing the mispronounced word recognition method.

Referring now to fig. 10, a schematic diagram of an electronic device (e.g., a terminal device or server running a software program) 1000 suitable for implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 10, the electronic device 1000 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 1001 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage means 1008 into a Random Access Memory (RAM) 1003. In the RAM1003, various programs and data necessary for the operation of the electronic apparatus 1000 are also stored. The processing device 1001, the ROM1002, and the RAM1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

Generally, the following devices may be connected to the I/O interface 1005: input devices 1006 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 1007 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 1008 including, for example, magnetic tape, hard disk, and the like; and a communication device 1009. The communication device 1009 may allow the electronic device 1000 to communicate with other devices wirelessly or by wire to exchange data. While fig. 10 illustrates an electronic device 1000 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the methods illustrated in fig. 1 and/or fig. 7. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 1009, or installed from the storage means 1008, or installed from the ROM 1002. The computer program, when executed by the processing device 1001, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

The electronic device provided by the embodiment of the present disclosure and the method for identifying the mispronounced word provided by the embodiment of the present disclosure belong to the same inventive concept, and technical details that are not described in detail in the embodiment of the present disclosure may be referred to in the embodiment of the present disclosure, and the embodiment of the present disclosure have the same beneficial effects. The disclosed embodiments provide a computer storage medium on which a computer program is stored, which when executed by a processor implements the method for identifying a misregistered word provided by the above-described embodiments.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:

receiving and displaying a target text input by a user in a first display area of a page; displaying a wrong-entry word pair list corresponding to the target text in a second display area of the page; the wrongly-distinguished word pair list comprises one or more wrongly-distinguished word pairs, the wrongly-distinguished word pairs comprise wrongly-distinguished words in the target text and correct words for correcting the wrongly-distinguished words, the wrongly-distinguished words comprise entity words and/or non-entity words, and the entity words and the non-entity words are identified from the target text based on different modes.

Or, causing the electronic device to:

acquiring a target text; inputting the target text into a wrong word error correction model to obtain an error correction text, wherein the error correction text is a correct text after wrong words in the target text are corrected, the wrong word error correction model is obtained by training according to a wrong sentence pair, the wrong sentence pair comprises a wrong sentence and a correct sentence, the wrong sentence is a sentence comprising the wrong words, and the correct sentence is a sentence not comprising the wrong words;

comparing the target text with the error correction text to obtain a first wrongly-distinguished word pair list, wherein the first wrongly-distinguished word pair list comprises a first wrongly-distinguished word in the target text and a first correct word corresponding to the first wrongly-distinguished word in the error correction text, and the first wrongly-distinguished word is a non-entity word; identifying a plurality of entity words from the target text, and determining a second misclassified word pair list according to the similarity of any two entity words in the plurality of entity words, wherein the second misclassified word list comprises second misclassified words and second correct words, and the second misclassified words and the second correct words are entity words; and obtaining a wrong-difference word pair list of the target text according to the first wrong-difference word pair list and the second wrong-difference word pair list.

Computer readable storage media may be written with computer program code for performing the operations of the present disclosure in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the name of a unit element does not in some cases constitute a limitation of the element itself,

the functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, [ example one ] there is provided a wrong-word recognition method, including:

receiving and displaying a target text input by a user in a first display area of a page;

displaying a wrong-entry word pair list corresponding to the target text in a second display area of the page; the wrongly-distinguished word pair list comprises one or more wrongly-distinguished word pairs, the wrongly-distinguished word pairs comprise wrongly-distinguished words in the target text and correct words for correcting the wrongly-distinguished words, the wrongly-distinguished words comprise entity words and/or non-entity words, and the entity words and the non-entity words are identified from the target text based on different modes.

According to one or more embodiments of the present disclosure, [ example two ] there is provided a misregistration word recognition method, further comprising: optionally, the method further comprises:

responding to a misclassified word modification instruction triggered by the user, and determining a target misclassified word corresponding to the misclassified word modification instruction;

and modifying the target wrongly-written words in the target text into corresponding target correct words.

According to one or more embodiments of the present disclosure, [ example three ] there is provided a misregistration word recognition method, further comprising: optionally, before modifying the mispronounced word in the target text into the corresponding correct word, the method further includes:

responding to a wrong-entry word modification instruction triggered by the user in the second display area, jumping to the display position of the target wrong entry word in the target text, and highlighting the target wrong entry word; or the like, or, alternatively,

and responding to a jump instruction triggered by clicking the wrongly-distinguished words by the user, jumping to the display position of the target wrongly-distinguished words in the target text, and highlighting the target wrongly-distinguished words.

According to one or more embodiments of the present disclosure, [ example four ] there is provided a misregistration word recognition method, further comprising: optionally, the method further comprises:

and in response to the user-triggered one-key modification instruction of the mispronounced words, modifying the one or more mispronounced words included in the list of the mispronounced word pairs into corresponding correct words.

According to one or more embodiments of the present disclosure, [ example five ] there is provided a misregistration word recognition method, further comprising: optionally, the target text includes N paragraphs, and correspondingly, the second display region includes N display sub-regions, where N is an integer greater than 1;

the displaying the list of the mispredicted word pairs of the target text in a second display area of the page includes:

and displaying the list of the wrongly-distinguished word pairs of the paragraph corresponding to each display sub-area of the second display area.

According to one or more embodiments of the present disclosure, [ example six ] there is provided a wrong-word recognition method, further comprising: the second display area comprises a first sub-display area and a second sub-display area, the first sub-display area is used for displaying the mischief pairs comprising the entity words, and the second sub-display area is used for displaying the mischief pairs comprising the non-entity words.

According to one or more embodiments of the present disclosure, [ example seven ] there is provided a wrong-word recognition method, further comprising: optionally, the misregistration word comprises a first misregistration word, and the method further comprises:

highlighting the first discriminant word in the first display area;

responding to a display instruction triggered by the user to the first wrongly-written word, and displaying a first correct word corresponding to the first wrongly-written word;

replacing the first wrong word in the target text with the first correct word in response to a modification operation triggered by the user;

deleting the first wrongly-distinguished word and the first correct word in the list of wrongly-distinguished word pairs.

According to one or more embodiments of the present disclosure, [ example eight ] there is provided a wrong-word recognition method, further comprising: optionally, the method further comprises:

and displaying the frequency of the occurrence of the misregistration words in the target text in the second display area.

According to one or more embodiments of the present disclosure, [ example nine ] there is provided a misregistration word recognition method including:

acquiring a target text;

inputting the target text into a wrong word error correction model to obtain an error correction text, wherein the error correction text is a correct text after wrong words in the target text are corrected, the wrong word error correction model is obtained by training according to a wrong sentence pair, the wrong sentence pair comprises a wrong sentence and a correct sentence, the wrong sentence is a sentence comprising the wrong words, and the correct sentence is a sentence not comprising the wrong words;

comparing the target text with the error correction text to obtain a first wrongly-distinguished word pair list, wherein the first wrongly-distinguished word pair list comprises a first wrongly-distinguished word in the target text and a first correct word corresponding to the first wrongly-distinguished word in the error correction text, and the first wrongly-distinguished word is a non-entity word;

identifying a plurality of entity words from the target text, and determining a second misclassified word pair list according to the similarity of any two entity words in the plurality of entity words, wherein the second misclassified word list comprises second misclassified words and second correct words, and the second misclassified words and the second correct words are entity words;

and obtaining a wrong-difference word pair list of the target text according to the first wrong-difference word pair list and the second wrong-difference word pair list.

According to one or more embodiments of the present disclosure, [ example ten ] there is provided a wrong-word recognition method, further comprising: optionally, the comparing the target text with the error correction text to obtain a first misclassified word pair list includes:

comparing the target text with the error correction text to obtain a first wrongly-distinguished word pair, wherein the first wrongly-distinguished word pair comprises a first wrongly-distinguished word and a first correct word;

and in response to the first wrongly-distinguished word being a non-entity word, adding the first wrongly-distinguished word pair to the first wrongly-distinguished word pair list.

According to one or more embodiments of the present disclosure, [ example eleven ] there is provided a misclassified word recognition method, further comprising: optionally, the method further comprises:

processing the error sentence and the correct sentence into corresponding vectors respectively;

and respectively inputting the vector corresponding to the error sentence and the vector corresponding to the correct sentence into the wrong word error correction model for training, wherein the wrong word error correction model is a bidirectional coding representation BERT model based on a converter or a software surface Softmask-BERT model based on the BERT.

According to one or more embodiments of the present disclosure, [ example twelve ] there is provided a misregistration word recognition method, further comprising: optionally, the target text comprises a first entity word and a second entity word; determining a second misclassified word pair list according to the similarity of any two entity words in the plurality of entity words comprises:

determining similarity of the first entity word and the second entity word;

and in response to that the similarity is greater than or equal to a second threshold and the first time is greater than a second time, determining the first entity word and the second entity word as a second wrongly-distinguished word pair, and adding the second wrongly-distinguished word pair into the second wrongly-distinguished word pair list, wherein the first time is the number of times that the first entity word appears in the target text, the second time is the number of times that the second entity word appears in the target text, the first entity word is a correct word of the second wrongly-distinguished word pair, and the second entity word is a wrongly-distinguished word of the second wrongly-distinguished word pair.

According to one or more embodiments of the present disclosure, [ example thirteen ] provides a wrong-word recognition method, further comprising: optionally, the similarity represents the number of letters in the same character string included in the pinyin string of the first entity word and the pinyin string of the second entity word.

According to one or more embodiments of the present disclosure, [ example fourteen ] there is provided a misclassified word recognition method, the method further comprising: optionally, the similarity is expressed as an edit distance between the pinyin string of the first entity word and the pinyin string of the second entity word.

According to one or more embodiments of the present disclosure, [ example fifteen ] there is provided an misregistration word recognition apparatus including: the acquisition module is used for receiving and displaying a target text input by a user in a first display area of a page;

the display module is used for displaying a wrong entry pair list corresponding to the target text in a second display area of the page; the wrongly-distinguished word pair list comprises one or more wrongly-distinguished word pairs, the wrongly-distinguished word pairs comprise wrongly-distinguished words in the target text and correct words for correcting the wrongly-distinguished words, the wrongly-distinguished words comprise entity words and/or non-entity words, and the entity words and the non-entity words are identified from the target text based on different modes.

According to one or more embodiments of the present disclosure, [ example sixteen ] there is provided an misregistration word recognition apparatus including:

the acquisition module is used for acquiring a target text;

the error correction module is used for inputting the target text into an error correction model to obtain an error correction text, wherein the error correction text is a correct text after the error words in the target text are corrected, the error correction model is obtained by training according to error sentence pairs, the error sentence pairs comprise error sentences and correct sentences, the error sentences are sentences comprising the error words, and the correct sentences are sentences not comprising the error words;

the first comparison module is used for comparing the target text with the error correction text to obtain a first wrongly-distinguished word pair list, wherein the first wrongly-distinguished word pair list comprises a first wrongly-distinguished word in the target text and a first correct word corresponding to the first wrongly-distinguished word in the error correction text, and the first wrongly-distinguished word is a non-entity word;

the second comparison module is used for identifying a plurality of entity words from the target text, and determining a second misclassified word pair list according to the similarity of any two entity words in the plurality of entity words, wherein the second misclassified word list comprises second misclassified words and second correct words, and the second misclassified words and the second correct words are entity words;

and the determining module is used for obtaining a wrong-entry word pair list of the target text according to the first wrong-entry word pair list and the second wrong-entry word pair list.

According to one or more embodiments of the present disclosure, [ example seventeen ] there is provided an electronic device comprising: one or more processors; a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a method of misidentification as in any of the embodiments of the application.

According to one or more embodiments of the present disclosure, [ example eighteen ] there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of identifying a wrong-entry word according to any one of the embodiments of the present application.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

27页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种词典构建方法和装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!