Automatic information error correction and calibration system and method

文档序号:1953435 发布日期:2021-12-10 浏览:11次 中文

阅读说明:本技术 一种信息自动纠错与校准系统及方法 (Automatic information error correction and calibration system and method ) 是由 骆飞 刘成书 骆闻心 于 2020-06-09 设计创作,主要内容包括:本发明涉及信息纠错与校准技术领域,具体的说是一种信息自动纠错与校准系统及方法,包括数据自动输入模块,数据预处理模块,疑似错误数据初步筛选模块,数据校准模块以及数据输出模块,所述自动输入模块与数据预处理模块相连接,所述数据预处理模块与疑似错误数据初步筛选模块相连接,所述疑似错误数据初步筛选模块与数据校准模块相连接,所述数据校准模块与数据输出模块相连接。发明提供的信息自动纠错与校准系统及方法具有通过程序化自动生成信息纠错、校准文章内容,无需人工干预,工作效率较高的优点。(The invention relates to the technical field of information error correction and calibration, in particular to an automatic information error correction and calibration system and method. The system and the method for automatically correcting and calibrating the information have the advantages that the contents of the information correction and calibration article are automatically generated by programming, manual intervention is not needed, and the working efficiency is high.)

1. An automatic information correction and calibration system, comprising: the data calibration system comprises an automatic data input module (1), a data preprocessing module (2), a suspected error data preliminary screening module (3), a data calibration module (4) and a data output module (5), wherein the automatic input module is connected with the data preprocessing module (2), the data preprocessing module (2) is connected with the suspected error data preliminary screening module (3), the suspected error data preliminary screening module (3) is connected with the data calibration module (4), and the data calibration module (4) is connected with the data output module (5).

2. An automatic information correction and calibration system as claimed in claim 1, wherein: and a display module (11) is arranged in the data automatic input module (1) and supports manual input of a visual interface.

3. An automatic information correction and calibration system as claimed in claim 1, wherein: the data automatic input module (1) is internally provided with a text reading module (12) which supports automatic information reading in a text file.

4. An automatic information correction and calibration system as claimed in claim 1, wherein: and a database reading module (13) is arranged in the data automatic input module (1) and supports automatic information reading from various databases.

5. An automatic information correction and calibration system as claimed in claim 1, wherein: the automatic data input module (1) is internally provided with a code conversion module (14) which supports various code conversions of UTF-8, Unicode and GBK.

6. An automatic information error correction and calibration method is characterized in that: the data preprocessing module (2) comprises the following steps: a) establishing a reference word list; b) preprocessing the reference words, unifying the reference words into UTF-8 coded texts, and mapping out a pinyin initial sequence; c) the input information is preprocessed and converted into UTF-8 coded text, and the pinyin initial sequence is mapped to form target data.

7. An automatic information correction and calibration method according to claim 6, characterized in that: the suspected error data preliminary screening module (3) comprises the following steps:

a) traversing target data by using a sliding window thought and taking a reference initial letter sequence as a window;

b) calculating the editing distance between the target data in the window and the reference word sequence;

c) filtering the interference information of the target data, and ignoring the condition that the editing distance result is large so as to reduce the workload of pattern matching and data calibration; meanwhile, according to the editing distance result, the sliding stride of the window is dynamically adjusted, so that unnecessary calculation is reduced, and the efficiency is improved;

d) performing pattern matching on the editing distance sequence, wherein when the difference between the reference word and the target data in the window is larger, the editing distance is larger, otherwise, the editing distance is smaller; and when the window data is closest to the reference word, the window data continues to slide, the editing distance is increased, and the window data is suspicious data.

8. An automatic information correction and calibration method according to claim 6, characterized in that: the data calibration module (4) comprises the following steps:

a) for suspected error data, reference words and pronunciation of the reference words are referred to judge the error type; if the first phonetic letter sequences are consistent but the original texts are different, the characters are harmonic ones; if the initial sequence is edited to be between 1 and 4, the characters are generally missed and mistaken;

b) finding out an accurate error position according to the error type and the pattern matching; searching the position where the editing distance starts to increase forwards, and finding out an accurate error starting position in each mode and error type;

c) the misjudgment is eliminated by referring to the common statement; the method has good effect on long words, and short words are changed more and are easy to misjudge; therefore, the word stock needs to be added and removed in a targeted manner so as to avoid information redundancy caused by misjudgment;

d) expanding a reference word bank and an excluded word bank through iterative training and manual labeling; by applying the reference word bank and the excluded word bank to different target data and continuously performing manual proofreading on the calculation result, the effectiveness can be continuously improved, and finally manual intervention is basically separated.

9. An automatic information correction and calibration method according to claim 6, characterized in that: the data output module (5) comprises the following steps: a) identifying error information; b) identifying correct information; c) counting the error rate of the information; d) the error information is automatically corrected to be confident and stored to a database of the specified type.

Technical Field

The invention relates to the technical field of information error correction and calibration, in particular to an automatic information error correction and calibration system and method.

Background

Information refers to audio, message, and the object transmitted and processed by communication system, and generally refers to all the contents of human social communication. People can recognize and reform the world by obtaining and identifying different information of the nature and the society to distinguish different things. In all communication and control systems, information is a form of universal association.

In the prior art, the information is audited, corrected and calibrated one by one manually, so that the efficiency is low and the accuracy is unstable.

Therefore, it is desirable to provide an automatic information correction and calibration system and method to solve the above-mentioned problems.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an automatic information error correction and calibration system and method, which automatically generate the contents of an information error correction and calibration article through programming without manual intervention and have higher working efficiency.

The technical scheme adopted by the invention for solving the technical problems is as follows: the system comprises an automatic data input module, a data preprocessing module, a suspected error data preliminary screening module, a data calibration module and a data output module, wherein the automatic input module is connected with the data preprocessing module, the data preprocessing module is connected with the suspected error data preliminary screening module, the suspected error data preliminary screening module is connected with the data calibration module, and the data calibration module is connected with the data output module.

The invention is further provided with: and a display module is arranged in the data automatic input module and supports manual input of a visual interface.

The invention is further provided with: and a text reading module is arranged in the data automatic input module and supports automatic information reading in the text file.

The invention is further provided with: and a database reading module is arranged in the data automatic input module and supports automatic information reading from various databases.

The invention is further provided with: and a code conversion module is arranged in the data automatic input module and supports various code conversions of UTF-8, Unicode and GBK.

The invention is further provided with: the data preprocessing module comprises the following steps: a) establishing a reference word list; b) preprocessing the reference words, unifying the reference words into UTF-8 coded texts, and mapping out a pinyin initial sequence; c) the input information is preprocessed and converted into UTF-8 coded text, and the pinyin initial sequence is mapped to form target data.

The invention is further provided with: the suspected error data preliminary screening module comprises the following steps:

a) traversing target data by using a sliding window thought and taking a reference initial letter sequence as a window;

b) calculating the editing distance between the target data in the window and the reference word sequence;

c) filtering the interference information of the target data, and ignoring the condition that the editing distance result is large so as to reduce the workload of pattern matching and data calibration; meanwhile, according to the editing distance result, the sliding stride of the window is dynamically adjusted, so that unnecessary calculation is reduced, and the efficiency is improved;

d) performing pattern matching on the editing distance sequence, wherein when the difference between the reference word and the target data in the window is larger, the editing distance is larger, otherwise, the editing distance is smaller; and when the window data is closest to the reference word, the window data continues to slide, the editing distance is increased, and the window data is suspicious data.

The invention is further provided with: the data calibration module comprises the following steps:

a) for suspected error data, reference words and pronunciation of the reference words are referred to judge the error type; if the first phonetic letter sequences are consistent but the original texts are different, the characters are harmonic ones; if the initial sequence is edited to be between 1 and 4, the characters are generally missed and mistaken;

b) finding out an accurate error position according to the error type and the pattern matching; searching the position where the editing distance starts to increase forwards, and finding out an accurate error starting position in each mode and error type;

c) the misjudgment is eliminated by referring to the common statement; the method has good effect on long words, and short words are changed more and are easy to misjudge; therefore, the word stock needs to be added and removed in a targeted manner so as to avoid information redundancy caused by misjudgment;

d) expanding a reference word bank and an excluded word bank through iterative training and manual labeling; by applying the reference word bank and the excluded word bank to different target data and continuously performing manual proofreading on the calculation result, the effectiveness can be continuously improved, and finally manual intervention is basically separated.

The invention is further provided with: the data output module comprises the following steps: a) identifying error information; b) identifying correct information; c) counting the error rate of the information; d) the error information is automatically corrected to be confident and stored to a database of the specified type.

The invention has the beneficial effects that:

according to the system and the method for automatically correcting and calibrating the information, the information is modeled, the visual platform is established for information operation through a big data algorithm and information characteristic analysis, correct information is automatically generated after correction and calibration, so that the contents of the information correction and calibration articles are automatically generated by programming, manual intervention is not needed, and the working efficiency is high.

Drawings

The invention is further illustrated with reference to the following figures and examples.

FIG. 1 is a system connection diagram illustrating an embodiment of the system and method for automatic error correction and calibration of information according to the present invention;

fig. 2 is a system connection diagram of the data automatic input module shown in fig. 1.

In the figure: 1. a data automatic input module; 11. a display module; 12. a text reading module; 13. a database reading module; 14. a code conversion module; 2. a data preprocessing module; 3. a suspected error data preliminary screening module; 4. a data calibration module; 5. and a data output module.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.

As shown in fig. 1-2, the system and method for automatically correcting and calibrating information according to the present invention includes an automatic data input module 1, a data preprocessing module 2, a suspected error data preliminary screening module 3, a data calibration module 4, and a data output module 5, where the automatic data input module is connected to the data preprocessing module 2, the data preprocessing module 2 is connected to the suspected error data preliminary screening module 3, the suspected error data preliminary screening module 3 is connected to the data calibration module 4, and the data calibration module 4 is connected to the data output module 5.

Further, a display module 11 is arranged in the data automatic input module 1, and manual input of a visual interface is supported.

Further, a text reading module 12 is arranged in the data automatic input module 1, and supports automatic information reading in a text file.

Further, a database reading module 13 is arranged in the data automatic input module 1, and supports automatic reading of information from various databases.

Further, a code conversion module 14 is arranged in the data automatic input module 1, and supports various code conversions of UTF-8, Unicode and GBK.

Further, the data preprocessing module 2 includes the following steps: a) establishing a reference word list; b) preprocessing the reference words, unifying the reference words into UTF-8 coded texts, and mapping out a pinyin initial sequence; c) the input information is preprocessed and converted into UTF-8 coded text, and the pinyin initial sequence is mapped to form target data.

Further, the suspected error data preliminary screening module 3 includes the following steps:

a) traversing target data by using a sliding window thought and taking a reference initial letter sequence as a window;

b) calculating the editing distance between the target data in the window and the reference word sequence;

c) filtering the interference information of the target data, and ignoring the condition that the editing distance result is large so as to reduce the workload of pattern matching and data calibration; meanwhile, according to the editing distance result, the sliding stride of the window is dynamically adjusted, so that unnecessary calculation is reduced, and the efficiency is improved;

d) performing pattern matching on the editing distance sequence, wherein when the difference between the reference word and the target data in the window is larger, the editing distance is larger, otherwise, the editing distance is smaller; and when the window data is closest to the reference word, the window data continues to slide, the editing distance is increased, and the window data is suspicious data.

Further, the data calibration module 4 includes the following steps:

a) for suspected error data, reference words and pronunciation of the reference words are referred to judge the error type; if the first phonetic letter sequences are consistent but the original texts are different, the characters are harmonic ones; if the initial sequence is edited to be between 1 and 4, the characters are generally missed and mistaken;

b) finding out an accurate error position according to the error type and the pattern matching; searching the position where the editing distance starts to increase forwards, and finding out an accurate error starting position in each mode and error type;

c) the misjudgment is eliminated by referring to the common statement; the method has good effect on long words, and short words are changed more and are easy to misjudge; therefore, the word stock needs to be added and removed in a targeted manner so as to avoid information redundancy caused by misjudgment;

d) expanding a reference word bank and an excluded word bank through iterative training and manual labeling; by applying the reference word bank and the excluded word bank to different target data and continuously performing manual proofreading on the calculation result, the effectiveness can be continuously improved, and finally manual intervention is basically separated.

Further, the data output module 5 includes the following steps: a) identifying error information; b) identifying correct information; c) counting the error rate of the information; d) the error information is automatically corrected to be confident and stored to a database of the specified type.

Compared with the related technology, the system and the method for automatically correcting and calibrating the information provided by the invention have the following beneficial effects:

the invention provides an information automatic error correction and calibration system and method, which are characterized in that information is modeled, a visual platform is established for information operation through a big data algorithm and information characteristic analysis, correct information is automatically generated after error correction and calibration, and therefore, the contents of information error correction and calibration articles are automatically generated by programming without manual intervention, and the work efficiency is higher.

The present application relates to circuits and electrical components and modules, all of which are prior art and are fully within the reach of those skilled in the art, and it is not necessary to state that the present invention is susceptible to such modifications as would be obvious to one skilled in the art.

In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "disposed," "mounted," "connected," and "fixed" are to be construed broadly, e.g., as meaning either fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.

The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the embodiments and descriptions given above are only illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which fall within the scope of the claims. The scope of the invention is defined by the appended claims and equivalents thereof.

9页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:语音识别纠错方法、系统、装置及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!