Text recognition method based on natural language semantic analysis

文档序号：169366 发布日期：2021-10-29 浏览：25次中文

阅读说明：本技术 一种基于自然语言语义分析的文本识别方法 (Text recognition method based on natural language semantic analysis ) 是由刘如君刘志杰陈乔尚雪松于 2021-06-07 设计创作，主要内容包括：本发明提供了一种基于自然语言语义分析的文本识别方法,包括：建立自然语言语义和标准语言语义的对应关系映射集；获取文本信息,提取文本中的语言信息,通过对应关系映射集识别语言信息中的标准语言语义和非标准语言语义；将非标准语言语义信息输入非标准语言语义分析系统分析判定,完成文本识别；融合了自然环境中语言语义的分析和同时进行文本识别,使得在对所获取的文本在进行自然语言语义分析时,有益于在自然语言语义的环境下对文本快速、精准、多类型同时识别。(The invention provides a text recognition method based on natural language semantic analysis, which comprises the following steps: establishing a mapping set of the corresponding relation between the natural language semantics and the standard language semantics; acquiring text information, extracting language information in the text, and identifying standard language semantics and non-standard language semantics in the language information through a corresponding relation mapping set; inputting the non-standard language semantic information into a non-standard language semantic analysis system for analysis and judgment to complete text recognition; the method integrates the analysis of language semantics in the natural environment and the text recognition, so that when the acquired text is subjected to the natural language semantic analysis, the method is beneficial to the quick, accurate and multi-type simultaneous recognition of the text in the natural language semantic environment.)

1. A text recognition method based on natural language semantic analysis is characterized by comprising the following steps:

establishing a mapping set of the corresponding relation between the natural language semantics and the standard language semantics;

acquiring text information, extracting language information in the text, and identifying standard language semantics and non-standard language semantics in the language information through a corresponding relation mapping set;

and inputting the non-standard language semantic information into a non-standard language semantic analysis system for analysis and judgment to complete text recognition.

2. The method according to claim 1, wherein the establishing a mapping set of correspondence between natural language semantics and standard language semantics comprises:

step 1: acquiring natural language information;

step 2: extracting semantic features of natural language information, and identifying a corresponding relation mapping set region range where the natural language information semantic information is located;

and step 3: comparing the text in the region with the text in a word bank according to the recognized text region to obtain initial text information;

and 4, step 4: analyzing the text information based on the characteristics of the initial text information;

step 6: judging the accuracy and the integrity of the text information characteristics according to the analysis result;

and 5: correcting the text information according to a natural language sequence;

and 7: segmenting text characters, and identifying the text characters;

and 8: and inputting the recognized text characters into a system mapping set to obtain a complete and accurate corresponding relation mapping set.

3. The text recognition method based on natural language semantic analysis according to claim 2, wherein the step 4: analyzing the text information based on the characteristics of the initial text information, comprising:

identifying a distribution texture of the text in the natural language; carrying out texture contrast analysis on the distribution texture and the background texture; when the texture contrast is larger than a set limit value, recognizing the texture contrast as a text feature, and extracting the text feature; and when the texture contrast is not greater than the set limit value, identifying the texture as non-text.

4. The method according to claim 2, wherein the characteristics of the natural language information include: detecting and positioning texts in a scene with interference noise in a natural environment; and identifying low-quality and seriously interfered texts in the text region, and further interpreting information contained in the natural language or video data according to the identification result of the text region.

5. The text recognition method based on natural language semantic analysis according to claim 1, wherein the text recognition method comprises: acquiring text information, extracting language information in the text, and identifying standard language semantics and non-standard language semantics in the language information through a corresponding relation mapping set; identifying standard language semantics and non-standard language semantics in the language information through the corresponding relation mapping set; the method comprises the following steps:

step S1, collecting natural language information in real time;

step S2, judging the information type of the collected natural language information;

step S3, judging whether the text information belongs to special information or general information; the method comprises the following steps: judging whether the text information belongs to special information or general information according to whether keywords contained in the keyword library exist in the text information; the existence of the keywords contained in the keyword library belongs to the proprietary information; the general information is associated with the keywords which are not contained in the keyword library; if the information belongs to the general information, the step is turned to step S4; if the proprietary information belongs to the proprietary information, the step S5 is turned to;

step S4, performing language semantic recognition on the text information judged to belong to the general information to form first language semantic recognition, and turning to step S6;

step S5, converting the text information judged to belong to the proprietary information into standard pinyin information, performing language semantic recognition on the standard pinyin information to form second language semantic recognition, and turning to step S6;

and step S6, finishing the first language semantic recognition and/or the second language semantic recognition, and generating a semantic recognition word bank.

6. The method for text recognition based on natural language semantic analysis according to claim 5, wherein the step S5 includes: step S51, converting the text information into initial pinyin information; step S52, carrying out fuzzy matching on the initial pinyin information to obtain the standard pinyin information; step S53, performing language semantic identification on the standard pinyin information to form the second language semantic identification, and turning to step S6.

7. The method of claim 6, wherein the step S52 of performing fuzzy matching on the initial pinyin information to obtain the standard pinyin information includes: and the fuzzy matching adopts homophonic consonant correction and/or front and back vowels for correction, corrected information is input into standard natural language semantic analysis after correction, and if the corrected information still contains unrecognizable content, cyclic correction is carried out until all the corrected information is recognized as the standard natural language semantic analysis.

8. The method according to claim 1, wherein the inputting of the non-standard language semantic information into the non-standard language semantic analysis system for analysis and determination to complete text recognition comprises: performing language semantic recognition on the text information to form language semantic recognition state distinction; distinguishing a first language semantic recognition state, and performing language semantic recognition on the text information judged to belong to the general field to form first language semantic recognition; the second conversion state is distinguished, and the text information which is judged to belong to the vertical field is converted into standard pinyin information; distinguishing the second language semantic recognition state, and performing language semantic recognition on the standard pinyin information to form second language semantic recognition; executing command execution operation for the first language semantic recognition and the second language semantic recognition; inputting the non-standard language semantic information into a non-standard language semantic analysis system for analysis and judgment, converting the text information judged to belong to the proprietary information into standard pinyin information, and finishing text recognition.

9. The text recognition method based on natural language semantic analysis according to claim 8, wherein performing language semantic recognition on the text information to form language semantic recognition state distinction comprises: text distortion state distinguishing, text expansion state distinguishing, text proportion state distinguishing and/or text fuzzy state distinguishing; calculating the minimum collection number of state discrimination:

wherein Q is_minThe minimum acquisition number of state discrimination is shown, omega is the acquisition error rate, n is the state discrimination number, and P is the discrimination probability; minimum number of acquisitions Q distinguished by calculation of the state_minMinimum number of acquisitions Q for state discrimination_minWhen the number of the reference acquisition is larger than the set reference acquisition number of the system, the language semantic recognition state discrimination is formed, and the state discrimination process is as follows: the state recognition and distinguishing of the text distortion state and the text standard state are carried out to distinguish that the degree of torsion is largeInputting the text part with the torsion degree larger than the set torsion degree range into a text expansion state for distinguishing in the text part with the set torsion degree range; text expansion state distinguishing, namely distinguishing a text part with the torsion degree larger than a set torsion degree range in an expansion state and an expansion reverse state, distinguishing the text part with the torsion degree larger than the set torsion degree range in the reverse state, entering a range with the torsion degree not larger than the set torsion degree, and performing language semantic recognition on text information; the text proportion state distinguishing is to perform state recognition distinguishing on the text after the text is amplified or reduced according to a set proportion and the text standard state; the fuzzy state distinguishing of the text distinguishes the overall characteristics of fuzzy font strokes of the text by state recognition, and the missing text distinguishes the state by the association of the language meaning of the text and the preceding and following languages.

10. The method of claim 8, wherein converting the text information determined to belong to the proprietary information into the standard pinyin information comprises: dividing the text information of the proprietary information into independent characters, and connecting the independent characters according to the interval weight respectively; calculating the interval weight value of connecting two characters in the natural language:

wherein, W is an interval weight connecting two characters; p (p) is the grey value of natural language, f (p) is the corresponding texture feature, x (p) is the spatial position of point p, and x (q) is the spatial position of point q. /(₂Representing the two-norm of the vector. Delta_pIs the standard deviation of the grey scale Gaussian function, delta_fIs the standard deviation of the grammatical Gaussian function, delta_xIs the standard deviation of the space distance Gaussian function, and r is the effective distance between two characters; standard deviation delta by grey scale gaussian function_pStandard deviation of the grammatical Gaussian function δ_fStandard deviation of space distance gaussian function delta_xFor adjusting gray between character points, respectivelyDegree difference, grammatical difference and spatial position difference are used for adjusting the interval weight W of two connected characters through exponent; according to the interval weight value for connecting the two characters, when the interval weight value for connecting the two characters is larger than the set interval weight value, the connection between the two characters in the text is judged to be not in accordance with the standard connection semantic meaning and is the non-standard language semantic meaning; inputting the non-standard language semantic information into a non-standard language semantic analysis system, and respectively passing through the standard variance delta of the gray Gaussian function_pAdjusting the gray level difference between character points, the standard deviation delta of the grammatical Gaussian function_fRegulating the difference between characters, the standard deviation delta of space distance Gaussian function_xAdjusting the spatial position difference; and judging that the connection between the two characters in the text finds the standard connection semantic meeting the requirement until the interval weight W between the two connected characters is not more than the set interval weight, and finally finishing the text recognition.

Technical Field

The invention relates to the field of text recognition, in particular to a text recognition method based on natural language semantic analysis

Background

At present, with the development of natural language processing technology, the language identification and semantic analysis technology based on general information is gradually improved, but in some special fields, the identification accuracy and the understanding accuracy are very low, and the problem of the corresponding relation between natural language semantics and standard language semantics still needs to be further solved; meanwhile, the recognition rate is low due to the fact that characters contained in natural language or video in the natural environment are unclear or damaged; how to extract language information in a text and identify standard language semantics and nonstandard language semantics in the language information is a technology which is yet to be perfected and solved; how to analyze and judge the non-standard language semantic information is not completely mature; therefore, there is a need for a text recognition method based on natural language semantic analysis to at least partially solve the problems in the prior art.

Disclosure of Invention

The invention provides a text recognition method based on natural language semantic analysis, which is used for solving the problem of text recognition in a natural environment. A text recognition method based on natural language semantic analysis comprises the following steps:

establishing a mapping set of the corresponding relation between the natural language semantics and the standard language semantics;

and inputting the non-standard language semantic information into a non-standard language semantic analysis system for analysis and judgment to complete text recognition.

Preferably, the establishing of the mapping set of correspondence between natural language semantics and standard language semantics includes:

step 1: acquiring natural language information;

and step 3: comparing the text in the region with the text in a word bank according to the recognized text region to obtain initial text information;

and 4, step 4: analyzing the text information based on the characteristics of the initial text information;

step 6: judging the accuracy and the integrity of the text information characteristics according to the analysis result;

and 5: correcting the text information according to a natural language sequence;

and 7: segmenting text characters, and identifying the text characters;

and 8: and inputting the recognized text characters into a system mapping set to obtain a complete and accurate corresponding relation mapping set.

Preferably, the step 4: analyzing the text information based on the characteristics of the initial text information, comprising: identifying a distribution texture of the text in the natural language; carrying out texture contrast analysis on the distribution texture and the background texture; when the texture contrast is larger than a set limit value, recognizing the texture contrast as a text feature, and extracting the text feature; and when the texture contrast is not greater than the set limit value, identifying the texture as non-text.

Preferably, the characteristics of the natural language information include: detecting and positioning texts in a scene with interference noise in a natural environment; and identifying low-quality and seriously interfered texts in the text region, and further interpreting information contained in the natural language or video data according to the identification result of the text region.

Preferably, the following components: acquiring text information, extracting language information in the text, and identifying standard language semantics and non-standard language semantics in the language information through a corresponding relation mapping set; identifying standard language semantics and non-standard language semantics in the language information through the corresponding relation mapping set; the method comprises the following steps:

step S1, collecting natural language information in real time;

step S2, judging the information type of the collected natural language information;

step S4, performing language semantic recognition on the text information judged to belong to the general information to form first language semantic recognition, and turning to step S6;

and step S6, finishing the first language semantic recognition and/or the second language semantic recognition, and generating a semantic recognition word bank.

Preferably, step S5 includes: step S51, converting the text information into initial pinyin information; step S52, carrying out fuzzy matching on the initial pinyin information to obtain the standard pinyin information; step S53, performing language semantic identification on the standard pinyin information to form the second language semantic identification, and turning to step S6.

Preferably, in step S52, the fuzzy matching is performed on the initial pinyin information to obtain the standard pinyin information, including: and the fuzzy matching adopts homophonic consonant correction and/or front and back vowels for correction, corrected information is input into standard natural language semantic analysis after correction, and if the corrected information still contains unrecognizable content, cyclic correction is carried out until all the corrected information is recognized as the standard natural language semantic analysis.

Preferably, the inputting of the non-standard language semantic information into the non-standard language semantic analysis system for analysis and determination to complete text recognition includes: performing language semantic recognition on the text information to form language semantic recognition state distinction; distinguishing a first language semantic recognition state, and performing language semantic recognition on the text information judged to belong to the general field to form first language semantic recognition; the second conversion state is distinguished, and the text information which is judged to belong to the vertical field is converted into standard pinyin information; distinguishing the second language semantic recognition state, and performing language semantic recognition on the standard pinyin information to form second language semantic recognition; executing command execution operation for the first language semantic recognition and the second language semantic recognition; inputting the non-standard language semantic information into a non-standard language semantic analysis system for analysis and judgment, converting the text information judged to belong to the proprietary information into standard pinyin information, and finishing text recognition.

Preferably, the performing the language semantic recognition on the text information to form the language semantic recognition state distinction includes: text distortion state discrimination, text expansion state discrimination, text proportion state

Distinguishing and/or text fuzzy state distinguishing; calculating the minimum collection number of state discrimination:

wherein Q is_minThe minimum acquisition number of state discrimination is shown, omega is the acquisition error rate, n is the state discrimination number, and P is the discrimination probability; minimum number of acquisitions Q distinguished by calculation of the state_minMinimum number of acquisitions Q for state discrimination_minWhen the number of the reference acquisition is larger than the set reference acquisition number of the system, the language semantic recognition state discrimination is formed, and the state discrimination process is as follows: performing state recognition and distinguishing on the text distortion state and the text standard state, distinguishing text parts with the torsion degree larger than a set torsion degree range, and inputting the text parts with the torsion degree larger than the set torsion degree range into a text expansion state for distinguishing; text expansion state distinguishing, namely distinguishing a text part with the torsion degree larger than a set torsion degree range in an expansion state and an expansion reverse state, distinguishing the text part with the torsion degree larger than the set torsion degree range in the reverse state, entering a range with the torsion degree not larger than the set torsion degree, and performing language semantic recognition on text information; the text proportion state distinguishing is to perform state recognition distinguishing on the text after the text is amplified or reduced according to a set proportion and the text standard state; text pasteAnd state distinguishing is carried out on the overall characteristics of fuzzy font strokes of the fuzzy text, and state distinguishing is carried out on the missing text through the association of the language meaning of the preceding and following text.

Preferably, converting the text information determined to belong to the proprietary information into standard pinyin information includes: dividing the text information of the proprietary information into independent characters, and connecting the independent characters according to the interval weight respectively; calculating the interval weight value of connecting two characters in the natural language:

wherein, W is an interval weight connecting two characters; p (p) is the grey value of natural language, f (p) is the corresponding texture feature, x (p) is the spatial position of point p, and x (q) is the spatial position of point q. /(₂Representing the two-norm of the vector. Delta_pIs the standard deviation of the grey scale Gaussian function, delta_fIs the standard deviation of the grammatical Gaussian function, delta_xIs the standard deviation of the space distance Gaussian function, and r is the effective distance between two characters; standard deviation delta by grey scale gaussian function_pStandard deviation of the grammatical Gaussian function δ_fStandard deviation of space distance gaussian function delta_xThe space weight value W is used for adjusting the gray level difference, the literary difference and the space position difference among character points and connecting two characters together through exponential adjustment; according to the interval weight value for connecting the two characters, when the interval weight value for connecting the two characters is larger than the set interval weight value, the connection between the two characters in the text is judged to be not in accordance with the standard connection semantic meaning and is the non-standard language semantic meaning; inputting the non-standard language semantic information into a non-standard language semantic analysis system, and respectively passing through the standard variance delta of the gray Gaussian function_pAdjusting the gray level difference between character points, the standard deviation delta of the grammatical Gaussian function_fRegulating the difference between characters, the standard deviation delta of space distance Gaussian function_xAdjusting the spatial position difference; until the interval weight W between two connected characters is not greater than the set interval weight, judgingAnd finding out a standard connection semantic meeting the connection between the two characters in the text, and finally finishing text recognition.

The beneficial effects of the above technical scheme are:

according to the method, the natural language information to be identified can be acquired by establishing a corresponding relation mapping set of the natural language semantics and the standard language semantics; acquiring text information, extracting language information in the text, and identifying standard language semantics and non-standard language semantics in the language information through a corresponding relation mapping set; the method can extract the characteristics of natural language information, identify the region range of the text information, and identify standard language semantics and non-standard language semantics in the language information through a corresponding relation mapping set according to the identified region range of the text; comparing the text in the area range with the text in the word stock to obtain initial text information, correspondingly judging the text information according to the characteristics of the initial text information, correcting the text information, segmenting text characters and identifying the text characters; and inputting the nonstandard language semantic information into a nonstandard language semantic analysis system for analysis and judgment, and completing text recognition to obtain complete and accurate collected text information.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

fig. 1 is a flowchart of a text recognition method based on natural language semantic analysis according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

Referring to fig. 1, an embodiment of the present invention provides a text recognition method based on natural language semantic analysis, including:

establishing a mapping set of the corresponding relation between the natural language semantics and the standard language semantics;

and inputting the non-standard language semantic information into a non-standard language semantic analysis system for analysis and judgment to complete text recognition.

The working principle of the technical scheme is as follows: acquiring natural language information to be identified by establishing a mapping set of a corresponding relation between natural language semantics and standard language semantics; acquiring text information, extracting language information in the text, and identifying standard language semantics and non-standard language semantics in the language information through a corresponding relation mapping set; extracting the characteristics of natural language information, identifying the region range of the text information, and identifying standard language semantics and non-standard language semantics in the language information through a corresponding relation mapping set according to the identified region range of the text; comparing the text in the area range with the text in the word stock to obtain initial text information, correspondingly judging the text information according to the characteristics of the initial text information, correcting the text information, segmenting text characters and identifying the text characters; and inputting the nonstandard language semantic information into a nonstandard language semantic analysis system for analysis and judgment, and completing text recognition to obtain complete and accurate collected text information.

The beneficial effects of the above technical scheme are that: according to the method, the natural language information to be identified can be acquired by establishing a corresponding relation mapping set of the natural language semantics and the standard language semantics; acquiring text information, extracting language information in the text, and identifying standard language semantics and non-standard language semantics in the language information through a corresponding relation mapping set; the method can extract the characteristics of natural language information, identify the region range of the text information, and identify standard language semantics and non-standard language semantics in the language information through a corresponding relation mapping set according to the identified region range of the text; comparing the text in the area range with the text in the word stock to obtain initial text information, correspondingly judging the text information according to the characteristics of the initial text information, correcting the text information, segmenting text characters and identifying the text characters; and inputting the nonstandard language semantic information into a nonstandard language semantic analysis system for analysis and judgment, and completing text recognition to obtain complete and accurate collected text information.

In one embodiment, the establishing a mapping set of correspondence between natural language semantics and standard language semantics includes:

step 1: acquiring natural language information;

and step 3: comparing the text in the region with the text in a word bank according to the recognized text region to obtain initial text information;

and 4, step 4: analyzing the text information based on the characteristics of the initial text information;

step 6: judging the accuracy and the integrity of the text information characteristics according to the analysis result;

and 5: correcting the text information according to a natural language sequence;

and 7: segmenting text characters, and identifying the text characters;

and 8: and inputting the recognized text characters into a system mapping set to obtain a complete and accurate corresponding relation mapping set.

The working principle of the technical scheme is as follows: the establishing of the mapping set of the corresponding relation between the natural language semantics and the standard language semantics comprises the following steps: acquiring natural language information; extracting semantic features of natural language information, and identifying a corresponding relation mapping set region range where the natural language information semantic information is located; comparing the text in the region with the text in a word bank according to the recognized text region to obtain initial text information; analyzing the text information based on the characteristics of the initial text information; judging the accuracy and the integrity of the text information characteristics according to the analysis result; correcting the text information according to a natural language sequence; segmenting text characters, and identifying the text characters; and inputting the recognized text characters into a system mapping set to obtain a complete and accurate corresponding relation mapping set.

The beneficial effects of the above technical scheme are that: establishing a mapping set of the corresponding relationship between the natural language semantics and the standard language semantics comprises the following steps: acquiring natural language information; extracting semantic features of natural language information, and identifying a corresponding relation mapping set region range where the natural language information semantic information is located; comparing the text in the region with the text in a word bank according to the recognized text region to obtain initial text information; analyzing the text information based on the characteristics of the initial text information; judging the accuracy and the integrity of the text information characteristics according to the analysis result; correcting the text information according to a natural language sequence; segmenting text characters, and identifying the text characters; and inputting the recognized text characters into a system mapping set to obtain a complete and accurate corresponding relation mapping set.

In one embodiment, the step 4: analyzing the text information based on the characteristics of the initial text information, comprising: identifying a distribution texture of the text in the natural language; carrying out texture contrast analysis on the distribution texture and the background texture; when the texture contrast is larger than a set limit value, recognizing the texture contrast as a text feature, and extracting the text feature; and when the texture contrast is not greater than the set limit value, identifying the texture as non-text.

The working principle of the technical scheme is as follows: analyzing the text information based on the characteristics of the initial text information, comprising: identifying a distribution texture of the text in the natural language; carrying out texture contrast analysis on the distribution texture and the background texture; when the texture contrast is larger than a set limit value, recognizing the texture contrast as a text feature, and extracting the text feature; and when the texture contrast is not greater than the set limit value, identifying the texture as non-text.

The beneficial effects of the above technical scheme are that: analyzing the text information based on the characteristics of the initial text information, comprising: identifying a distribution texture of the text in the natural language; carrying out texture contrast analysis on the distribution texture and the background texture; when the texture contrast is larger than a set limit value, recognizing the texture contrast as a text feature, and extracting the text feature; and when the texture contrast is not greater than the set limit value, identifying the texture as non-text.

In one embodiment, the characteristics of the natural language information include: detecting and positioning texts in a scene with interference noise in a natural environment; and identifying low-quality and seriously interfered texts in the text region, and further interpreting information contained in the natural language or video data according to the identification result of the text region.

The working principle of the technical scheme is as follows: detecting and positioning texts in a scene with interference noise in a natural environment; and identifying low-quality and seriously interfered texts in the text region, and further interpreting information contained in the natural language or video data according to the identification result of the text region.

The beneficial effects of the above technical scheme are that: detecting and positioning texts with interference noise scenes in natural environment by using the characteristics of natural language information; and identifying low-quality and seriously interfered texts in the text region, and further interpreting information contained in the natural language or video data according to the identification result of the text region.

In one embodiment, the: acquiring text information, extracting language information in the text, and identifying standard language semantics and non-standard language semantics in the language information through a corresponding relation mapping set; identifying standard language semantics and non-standard language semantics in the language information through the corresponding relation mapping set; the method comprises the following steps:

step S1, collecting natural language information in real time;

step S2, judging the information type of the collected natural language information;

step S4, performing language semantic recognition on the text information judged to belong to the general information to form first language semantic recognition, and turning to step S6;

and step S6, finishing the first language semantic recognition and/or the second language semantic recognition, and generating a semantic recognition word bank.

The working principle of the technical scheme is as follows: acquiring text information, extracting language information in the text, and identifying standard language semantics and non-standard language semantics in the language information through a corresponding relation mapping set; identifying standard language semantics and non-standard language semantics in the language information through the corresponding relation mapping set; the method comprises the following steps: step S1, collecting natural language information in real time; step S2, judging the information type of the collected natural language information; step S3, judging whether the text information belongs to special information or general information; the method comprises the following steps: judging whether the text information belongs to special information or general information according to whether keywords contained in the keyword library exist in the text information; the existence of the keywords contained in the keyword library belongs to the proprietary information; the general information is associated with the keywords which are not contained in the keyword library; if the information belongs to the general information, the step is turned to step S4; if the proprietary information belongs to the proprietary information, the step S5 is turned to; step S4, performing language semantic recognition on the text information judged to belong to the general information to form first language semantic recognition, and turning to step S6; step S5, converting the text information judged to belong to the proprietary information into standard pinyin information, performing language semantic recognition on the standard pinyin information to form second language semantic recognition, and turning to step S6; and step S6, finishing the first language semantic recognition and/or the second language semantic recognition, and generating a semantic recognition word bank.

The beneficial effects of the above technical scheme are that: acquiring text information, extracting language information in the text, and identifying standard language semantics and non-standard language semantics in the language information through a corresponding relation mapping set; identifying standard language semantics and non-standard language semantics in the language information through the corresponding relation mapping set; the method comprises the following steps: collecting natural language information in real time; judging the type of the collected natural language information; judging whether the text information belongs to special information or general information; judging whether the text information belongs to special information or general information according to whether keywords contained in the keyword library exist in the text information; performing language semantic recognition on the text information judged to belong to the general information to form first language semantic recognition and turning to the subsequent step; converting the text information judged to belong to the proprietary information into standard pinyin information, and performing language semantic recognition on the standard pinyin information to form second language semantic recognition and forwarding to the subsequent step; and finishing the first language semantic recognition and/or the second language semantic recognition, and generating a semantic recognition word bank.

In one embodiment, the step S5 includes: step S51, converting the text information into initial pinyin information; step S52, carrying out fuzzy matching on the initial pinyin information to obtain the standard pinyin information; step S53, performing language semantic identification on the standard pinyin information to form the second language semantic identification, and turning to step S6.

The working principle of the technical scheme is as follows: converting the text information into initial pinyin information; carrying out fuzzy matching on the initial pinyin information to obtain the standard pinyin information; and performing language semantic recognition on the standard pinyin information to form the second language semantic recognition.

The beneficial effects of the above technical scheme are that: converting the text information into initial pinyin information; carrying out fuzzy matching on the initial pinyin information to obtain the standard pinyin information; and performing language semantic recognition on the standard pinyin information to form the second language semantic recognition.

In one embodiment, the step S52, performing fuzzy matching on the initial pinyin information to obtain the standard pinyin information, including: and the fuzzy matching adopts homophonic consonant correction and/or front and back vowels for correction, corrected information is input into standard natural language semantic analysis after correction, and if the corrected information still contains unrecognizable content, cyclic correction is carried out until all the corrected information is recognized as the standard natural language semantic analysis.

The working principle of the technical scheme is as follows: the step S52, performing fuzzy matching on the initial pinyin information to obtain the standard pinyin information, including: and the fuzzy matching adopts homophonic consonant correction and/or front and back vowels for correction, corrected information is input into standard natural language semantic analysis after correction, and if the corrected information still contains unrecognizable content, cyclic correction is carried out until all the corrected information is recognized as the standard natural language semantic analysis.

The beneficial effects of the above technical scheme are that: performing fuzzy matching on the initial pinyin information to obtain the standard pinyin information, including: and the fuzzy matching adopts homophonic consonant correction and/or front and back vowels for correction, corrected information is input into standard natural language semantic analysis after correction, and if the corrected information still contains unrecognizable content, cyclic correction is carried out until all the corrected information is recognized as the standard natural language semantic analysis.

In one embodiment, the inputting the non-standard language semantic information into the non-standard language semantic analysis system for analysis and judgment to complete text recognition includes: performing language semantic recognition on the text information to form language semantic recognition state distinction; distinguishing a first language semantic recognition state, and performing language semantic recognition on the text information judged to belong to the general field to form first language semantic recognition; the second conversion state is distinguished, and the text information which is judged to belong to the vertical field is converted into standard pinyin information; distinguishing the second language semantic recognition state, and performing language semantic recognition on the standard pinyin information to form second language semantic recognition; executing command execution operation for the first language semantic recognition and the second language semantic recognition; inputting the non-standard language semantic information into a non-standard language semantic analysis system for analysis and judgment, converting the text information judged to belong to the proprietary information into standard pinyin information, and finishing text recognition.

The working principle of the technical scheme is as follows: inputting the non-standard language semantic information into a non-standard language semantic analysis system for analysis and judgment to complete text recognition, and the method comprises the following steps: performing language semantic recognition on the text information to form language semantic recognition state distinction; distinguishing a first language semantic recognition state, and performing language semantic recognition on the text information judged to belong to the general field to form first language semantic recognition; the second conversion state is distinguished, and the text information which is judged to belong to the vertical field is converted into standard pinyin information; distinguishing the second language semantic recognition state, and performing language semantic recognition on the standard pinyin information to form second language semantic recognition; executing command execution operation for the first language semantic recognition and the second language semantic recognition; inputting the non-standard language semantic information into a non-standard language semantic analysis system for analysis and judgment, converting the text information judged to belong to the proprietary information into standard pinyin information, and finishing text recognition.

The beneficial effects of the above technical scheme are that: the text recognition is completed by inputting the non-standard language semantic information into a non-standard language semantic analysis system for analysis and judgment, and comprises the following steps: performing language semantic recognition on the text information to form language semantic recognition state distinction; distinguishing a first language semantic recognition state, and performing language semantic recognition on the text information judged to belong to the general field to form first language semantic recognition; the second conversion state is distinguished, and the text information which is judged to belong to the vertical field is converted into standard pinyin information; distinguishing the second language semantic recognition state, and performing language semantic recognition on the standard pinyin information to form second language semantic recognition; executing command execution operation for the first language semantic recognition and the second language semantic recognition; inputting the non-standard language semantic information into a non-standard language semantic analysis system for analysis and judgment, converting the text information judged to belong to the proprietary information into standard pinyin information, and finishing text recognition.

In one embodiment, performing language-semantic recognition on the text information to form language-semantic recognition state distinction comprises: text distortion state distinguishing, text expansion state distinguishing, text proportion state distinguishing and/or text fuzzy state distinguishing; calculating the minimum collection number of state discrimination:

wherein Q is_minThe minimum acquisition number of state discrimination is shown, omega is the acquisition error rate, n is the state discrimination number, and P is the discrimination probability; minimum number of acquisitions Q distinguished by calculation of the state_minMinimum number of acquisitions Q for state discrimination_minWhen the number of the reference acquisition is larger than the set reference acquisition number of the system, the language semantic recognition state discrimination is formed, and the state discrimination process is as follows: performing state recognition and distinguishing on the text distortion state and the text standard state, distinguishing text parts with the torsion degree larger than a set torsion degree range, and inputting the text parts with the torsion degree larger than the set torsion degree range into a text expansion state for distinguishing; text expansion state distinguishing, namely distinguishing a text part with the torsion degree larger than a set torsion degree range in an expansion state and an expansion reverse state, distinguishing the text part with the torsion degree larger than the set torsion degree range in the reverse state, entering a range with the torsion degree not larger than the set torsion degree, and performing language semantic recognition on text information; the text proportion state distinguishing is to perform state recognition distinguishing on the text after the text is amplified or reduced according to a set proportion and the text standard state; the fuzzy state distinguishing of the text distinguishes the overall characteristics of fuzzy font strokes of the text by state recognition, and the missing text distinguishes the state by the association of the language meaning of the text and the preceding and following languages.

The working principle of the technical scheme is as follows: performing language semantic recognition on the text information to form language semantic recognition state distinction, comprising: text warp state discrimination, text stretch state discrimination, text scale state discrimination, and/or text blur state discriminationDistinguishing; calculating the minimum number of acquisitions, Q, for state discrimination_minThe minimum acquisition number of state discrimination is shown, omega is the acquisition error rate, n is the state discrimination number, and P is the discrimination probability; minimum number of acquisitions Q distinguished by calculation of the state_minMinimum number of acquisitions Q for state discrimination_minWhen the number of the reference acquisition is larger than the set reference acquisition number of the system, the language semantic recognition state discrimination is formed, and the state discrimination process is as follows: performing state recognition and distinguishing on the text distortion state and the text standard state, distinguishing text parts with the torsion degree larger than a set torsion degree range, and inputting the text parts with the torsion degree larger than the set torsion degree range into a text expansion state for distinguishing; text expansion state distinguishing, namely distinguishing a text part with the torsion degree larger than a set torsion degree range in an expansion state and an expansion reverse state, distinguishing the text part with the torsion degree larger than the set torsion degree range in the reverse state, entering a range with the torsion degree not larger than the set torsion degree, and performing language semantic recognition on text information; the text proportion state distinguishing is to perform state recognition distinguishing on the text after the text is amplified or reduced according to a set proportion and the text standard state; the fuzzy state distinguishing of the text carries out state recognition distinguishing on the overall characteristics of fuzzy font strokes of the text, and carries out state recognition distinguishing on the missing text through the association of the words and meanings of the text preceding and following languages;

on the basis, the character information is converted into the standard pinyin information through the character characteristics, and the standard pinyin information can be processed, so that the problems of unclear or incomplete fonts and the like in natural languages are solved, and the text identification in the natural environment is more accurate; the standard pinyin information may be standard chinese pinyin information, including, for example, initial consonant and sub-information, final consonant and sub-information, and the like; the standard pinyin information can also be phonetic transcription or pinyin information of other languages; the special information can be in the medical field or the smart home control field, and the general information can be in the more common character field; the judging mechanism can specifically search the text information in a special information judging database, if the text information is searched and matched, the text information is judged to belong to special information, and if not, the text information belongs to general information; in addition, other judgment methods can be adopted, for example, whether the information belongs to the special information or not is judged through a preset vertical scene; the pinyin conversion unit is used for converting the character information into initial pinyin information; the fuzzy matching unit is used for carrying out fuzzy matching on the initial pinyin information to obtain standard pinyin information.

The beneficial effects of the above technical scheme are that: performing language semantic recognition on the text information to form language semantic recognition state distinction, comprising: text distortion state distinguishing, text expansion state distinguishing, text proportion state distinguishing and/or text fuzzy state distinguishing; calculating the minimum number of acquisitions, Q, for state discrimination_minThe minimum acquisition number of state discrimination is shown, omega is the acquisition error rate, n is the state discrimination number, and P is the discrimination probability; minimum number of acquisitions Q distinguished by calculation of the state_minMinimum number of acquisitions Q for state discrimination_minWhen the number of the reference acquisition is larger than the reference acquisition number set by the system, language semantic recognition state distinction is formed; performing state recognition and distinguishing on the text distortion state and the text standard state, distinguishing text parts with the torsion degree larger than a set torsion degree range, and inputting the text parts with the torsion degree larger than the set torsion degree range into a text expansion state for distinguishing; text expansion state distinguishing, namely distinguishing a text part with the torsion degree larger than a set torsion degree range in an expansion state and an expansion reverse state, distinguishing the text part with the torsion degree larger than the set torsion degree range in the reverse state, entering a range with the torsion degree not larger than the set torsion degree, and performing language semantic recognition on text information; the text proportion state distinguishing is to perform state recognition distinguishing on the text after the text is amplified or reduced according to a set proportion and the text standard state; the fuzzy state distinguishing of the text carries out state recognition distinguishing on the overall characteristics of fuzzy font strokes of the text, and carries out state recognition distinguishing on the missing text through the association of the words and meanings of the text preceding and following languages; in addition, the character information is converted into the standard pinyin information through the character characteristics, and the standard pinyin information can be processed, so that the problems of unclear or incomplete fonts and the like in natural languages are solved, and the text identification in the natural environment is more accurate; the standard pinyin information may be standard chinese pinyin information, including, for example, initial consonant and sub-information, final consonant and sub-information, and the like; the standard pinyin information can also be the pinyin informationPhonetic transcription or phonetic information of other languages; the special information can be in the medical field or the smart home control field, and the general information can be in the more common character field; the judging mechanism can specifically search the text information in a special information judging database, if the text information is searched and matched, the text information is judged to belong to special information, and if not, the text information belongs to general information; in addition, other judgment methods can be adopted, for example, whether the information belongs to the special information or not is judged through a preset vertical scene; the pinyin conversion unit is used for converting the character information into initial pinyin information; the fuzzy matching unit is used for carrying out fuzzy matching on the initial pinyin information to obtain standard pinyin information.

In one embodiment, converting text information determined to belong to proprietary information into standard pinyin information includes: dividing the text information of the proprietary information into independent characters, and connecting the independent characters according to the interval weight respectively; calculating the interval weight value of connecting two characters in the natural language:

wherein, W is an interval weight connecting two characters; p (p) is the grey value of natural language, f (p) is the corresponding texture feature, x (p) is the spatial position of point p, and x (q) is the spatial position of point q. /(₂Representing the two-norm of the vector. Delta_pIs the standard deviation of the grey scale Gaussian function, delta_fIs the standard deviation of the grammatical Gaussian function, delta_xIs the standard deviation of the space distance Gaussian function, and r is the effective distance between two characters; standard deviation delta by grey scale gaussian function_pStandard deviation of the grammatical Gaussian function δ_fStandard deviation of space distance gaussian function delta_xThe space weight value W is used for adjusting the gray level difference, the literary difference and the space position difference among character points and connecting two characters together through exponential adjustment; according to the interval weight value of connecting two characters, when the interval weight value between two connected characters is greater than the set interval weight value, the connection between two characters in the text is judged to be not in accordance with the standardThe connection semantics are non-standard language semantics; inputting the non-standard language semantic information into a non-standard language semantic analysis system, and respectively passing through the standard variance delta of the gray Gaussian function_pAdjusting the gray level difference between character points, the standard deviation delta of the grammatical Gaussian function_fRegulating the difference between characters, the standard deviation delta of space distance Gaussian function_xAdjusting the spatial position difference; and judging that the connection between the two characters in the text finds the standard connection semantic meeting the requirement until the interval weight W between the two connected characters is not more than the set interval weight, and finally finishing the text recognition.

The working principle of the technical scheme is as follows: converting the text information judged to belong to the proprietary information into standard pinyin information, including: dividing the text information of the proprietary information into independent characters, and connecting the independent characters according to the interval weight respectively; calculating an interval weight value connecting two characters in the natural language, wherein W is the interval weight value connecting the two characters; p (p) is the grey value of natural language, f (p) is the corresponding texture feature, x (p) is the spatial position of point p, and x (q) is the spatial position of point q. /. 2 represents the two-norm vector. Delta_pIs the standard deviation of the grey scale Gaussian function, delta_fIs the standard deviation of the grammatical Gaussian function, delta_xIs the standard deviation of the space distance Gaussian function, and r is the effective distance between two characters; standard deviation delta by grey scale gaussian function_pStandard deviation of the grammatical Gaussian function δ_fStandard deviation of space distance gaussian function delta_xThe space weight value W is used for adjusting the gray level difference, the literary difference and the space position difference among character points and connecting two characters together through exponential adjustment; according to the interval weight value for connecting the two characters, when the interval weight value for connecting the two characters is larger than the set interval weight value, the connection between the two characters in the text is judged to be not in accordance with the standard connection semantic meaning and is the non-standard language semantic meaning; inputting the non-standard language semantic information into a non-standard language semantic analysis system, and respectively passing through the standard variance delta of the gray Gaussian function_pAdjusting the gray level difference between character points, the standard deviation delta of the grammatical Gaussian function_fThe difference of the characters between the characters is adjusted,standard deviation delta of space distance gaussian function_xAdjusting the spatial position difference; until the interval weight W between the two connected characters is not more than the set interval weight, judging that the connection between the two characters in the text finds a standard connection semantic meeting the standard connection semantic meaning, and finally finishing text recognition;

the beneficial effects of the above technical scheme are that: the invention converts the text information judged to belong to the proprietary information into the standard pinyin information, and comprises the following steps: dividing the text information of the proprietary information into independent characters, and connecting the independent characters according to the interval weight respectively; calculating an interval weight value connecting two characters in the natural language, wherein W is the interval weight value connecting the two characters; p (p) is the grey value of natural language, f (p) is the corresponding texture feature, x (p) is the spatial position of point p, and x (q) is the spatial position of point q. /. 2 represents the two-norm vector. Delta_pIs the standard deviation of the grey scale Gaussian function, delta_fIs the standard deviation of the grammatical Gaussian function, delta_xIs the standard deviation of the space distance Gaussian function, and r is the effective distance between two characters; standard deviation delta by grey scale gaussian function_pStandard deviation of the grammatical Gaussian function δ_fStandard deviation of space distance gaussian function delta_xThe space weight value W is used for adjusting the gray level difference, the literary difference and the space position difference among character points and connecting two characters together through exponential adjustment; according to the interval weight value for connecting the two characters, when the interval weight value for connecting the two characters is larger than the set interval weight value, the connection between the two characters in the text is judged to be not in accordance with the standard connection semantic meaning and is the non-standard language semantic meaning; inputting the non-standard language semantic information into a non-standard language semantic analysis system, and respectively passing through the standard variance delta of the gray Gaussian function_pAdjusting the gray level difference between character points, the standard deviation delta of the grammatical Gaussian function_fRegulating the difference between characters, the standard deviation delta of space distance Gaussian function_xAdjusting the spatial position difference; and judging that the connection between the two characters in the text finds the standard connection semantic meeting the requirement until the interval weight W between the two connected characters is not more than the set interval weight, and finally finishing the text recognition.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

15页详细技术资料下载

Text recognition method based on natural language semantic analysis

相关技术

网友询问留言