Semantic recognition method and system based on double-layer data format

文档序号：169368 发布日期：2021-10-29 浏览：15次中文

阅读说明：本技术 一种基于双层数据格式的语义识别方法及系统 (Semantic recognition method and system based on double-layer data format ) 是由倪亚晖武斌赵锦春林雪于 2021-08-26 设计创作，主要内容包括：本发明公开了一种基于双层数据格式的语义识别方法及系统,根据第一格式转化指令将第一标准文献进行格式转化,获得第一转化结果,通过第一语语义解析指令将第一层格式数据进行数据解析,获得第一解析结果；将第二层格式数据进行初步段落划分,获得第一划分结果；通过第一词性解析指令对第一划分结果进行基于词性的组合划分,获得第二划分结果；通过第一词频聚类指令对第一划分结果进行词频组合聚类,获得第三划分结果；将第二划分结果、第三划分结果和第一解析结果输入类神经网络数据模型,获得第一语义识别结果。解决了现有技术中对文本解析的过程中存在不能根据文本的特性,进行智能化解析,进而导致解析不够智能准确的技术问题。(The invention discloses a semantic identification method and a semantic identification system based on a double-layer data format.A first standard document is subjected to format conversion according to a first format conversion instruction to obtain a first conversion result, and data analysis is carried out on first-layer format data through a first semantic analysis instruction to obtain a first analysis result; carrying out primary paragraph division on the second layer format data to obtain a first division result; performing part-of-speech-based combined division on the first division result through the first part-of-speech analysis instruction to obtain a second division result; performing word frequency combination clustering on the first division result through the first word frequency clustering instruction to obtain a third division result; and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain a first semantic recognition result. The method and the device solve the technical problem that in the prior art, intelligent analysis cannot be performed according to the characteristics of the text in the text analysis process, and further analysis is not intelligent and accurate enough.)

1. A semantic recognition method based on a two-layer data format, wherein the method comprises:

obtaining a first format conversion instruction, and performing format conversion on a first standard document according to the first format conversion instruction to obtain a first conversion result, wherein the first conversion result comprises first layer format data and second layer format data, and the first layer format data is different from the second layer format data;

obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result;

obtaining a first paragraph dividing instruction, and performing preliminary paragraph division on the second-layer format data according to the first paragraph dividing instruction to obtain a first division result;

obtaining a first part-of-speech analysis instruction, and performing part-of-speech-based combined division on the first division result through the first part-of-speech analysis instruction to obtain a second division result;

obtaining a first word frequency clustering instruction, and performing word frequency combination clustering on the first division result through the first word frequency clustering instruction to obtain a third division result;

and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain a first semantic recognition result.

2. The method of claim 1, wherein the method further comprises:

obtaining a standard literature data set through big data, and carrying out format feature screening based on the standard literature data set to obtain a first subject format, a first title format and a first text format;

and carrying out format division on the second-layer format data through the first subject format, the first title format and the first text format to obtain the first division result.

3. The method of claim 1, wherein the inputting the second partition result, the third partition result, and the first parsing result into a neural network-like data model to obtain a first semantic recognition result, further comprises:

when the second division result is inconsistent with the third division result, obtaining a first semantic analysis instruction;

performing semantic analysis on the second division result and the third division result through the first semantic analysis instruction to obtain a first paraphrase result and a second paraphrase result;

when the first paraphrasing result and the second paraphrasing result are different, obtaining a first keyword and position information of the first keyword;

inputting the first keyword and the position information into a first position deviation analysis model to obtain a first identification result;

and obtaining the first semantic recognition result through the first identification result and the first analysis result.

4. The method of claim 3, wherein the method further comprises:

obtaining a first sentence according to the first keyword;

obtaining a first statement analysis instruction, and performing part-of-speech tagging on the first statement through the first statement analysis instruction to obtain a first part-of-speech tagging result;

obtaining a first judgment instruction, and judging whether a verb labeling result in the first part-of-speech labeling result is unique or not through the first judgment instruction;

when the verb labeling result is not unique, verifying the non-unique verb labeling result, and after the verb labeling result is verified to be correct, extracting the predetermined relation of the verb labeling result to obtain a first extraction result;

and obtaining the first semantic recognition result according to the first extraction result.

5. The method of claim 4, wherein the obtaining a first judgment instruction, by which to judge whether a verb tagging result in the first part-of-speech tagging result is unique, further comprises:

when the verb labeling result is judged to be unique, extracting the verb labeling result in a predetermined relation to obtain a second extraction result;

obtaining a first full-text comparison instruction, and performing bias check on the first keyword according to the first full-text comparison instruction to obtain a first check result;

and substituting the position information and the first verification result into the second extraction result to obtain the first semantic recognition result.

6. The method of claim 2, wherein the obtaining of the standard literature data set through big data and the format feature screening based on the standard literature data set to obtain the first subject format, the first title format, and the first body format further comprises:

obtaining a first zooming instruction, and carrying out standardized zooming on the standard literature data set according to the first zooming instruction to obtain a second standard literature data set;

and performing feature extraction through space position features and space quantity features to obtain a first extraction result, wherein the first extraction result comprises the first subject format, the first title format and the first text format.

7. The method of claim 1, wherein the inputting the second partition result, the third partition result, and the first parsing result into a neural network-like data model to obtain a first semantic recognition result, further comprises:

constructing the neural network data model, wherein the neural network data model is obtained by training a plurality of groups of training data, and each group of the plurality of groups of training data comprises: the second division result, the third division result, the first parsing result, and identification information identifying a semantic parsing result;

and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain the first semantic recognition result.

8. A semantic recognition system based on a two-layer data format, wherein the system comprises:

a first obtaining unit, configured to obtain a first format conversion instruction, perform format conversion on a first standard document according to the first format conversion instruction, and obtain a first conversion result, where the first conversion result includes first layer format data and second layer format data, and the first layer format data is different from the second layer format data;

the second obtaining unit is used for obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result;

a third obtaining unit, configured to obtain a first paragraph splitting instruction, and perform preliminary paragraph splitting on the second layer format data according to the first paragraph splitting instruction to obtain a first splitting result;

a fourth obtaining unit, configured to obtain a first part-of-speech parsing instruction, and perform part-of-speech-based combined partitioning on the first partitioning result through the first part-of-speech parsing instruction to obtain a second partitioning result;

a fifth obtaining unit, configured to obtain a first word frequency clustering instruction, and perform word frequency combination clustering on the first partition result through the first word frequency clustering instruction to obtain a third partition result;

a sixth obtaining unit, configured to input the second division result, the third division result, and the first parsing result into a neural network-like data model, so as to obtain a first semantic recognition result.

9. A semantic recognition system based on a two-layer data format comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method of one of claims 1 to 7.

Technical Field

The invention relates to the field related to semantic analysis algorithms, in particular to a semantic identification method and a semantic identification system based on a double-layer data format.

Background

Semantic analysis (semantic analysis) is a branch of artificial intelligence, is a plurality of core tasks of natural language processing technology, relates to multiple subjects such as linguistics, computational linguistics, machine learning and cognitive languages, and is helpful for promoting the rapid development of other natural language processing tasks.

Semantic analysis refers to learning and understanding semantic contents represented by a section of text by using various methods, any language understanding can be classified into the semantic analysis category, the semantic analysis focuses on realizing automatic semantic analysis of each language unit by establishing an effective model and system, so that the real semantics of the whole text expression is understood, such as the semantics of words is obtained or distinguished, and the problems of synonyms and word ambiguity in text processing are solved.

However, in the process of implementing the technical solution of the invention in the embodiments of the present application, the inventors of the present application find that the above-mentioned technology has at least the following technical problems:

in the prior art, intelligent analysis cannot be performed according to the characteristics of a text in the text analysis process, so that the analysis is not intelligent and accurate enough.

Disclosure of Invention

In view of the foregoing problems, the embodiments of the present application provide a semantic recognition method and system based on a dual-layer data format.

In a first aspect, the present application provides a semantic identification method based on a two-layer data format, where the method includes: obtaining a first format conversion instruction, and performing format conversion on a first standard document according to the first format conversion instruction to obtain a first conversion result, wherein the first conversion result comprises first layer format data and second layer format data, and the first layer format data is different from the second layer format data; obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result; obtaining a first paragraph dividing instruction, and performing preliminary paragraph division on the second-layer format data according to the first paragraph dividing instruction to obtain a first division result; obtaining a first part-of-speech analysis instruction, and performing part-of-speech-based combined division on the first division result through the first part-of-speech analysis instruction to obtain a second division result; obtaining a first word frequency clustering instruction, and performing word frequency combination clustering on the first division result through the first word frequency clustering instruction to obtain a third division result; and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain a first semantic recognition result.

In another aspect, the present application further provides a semantic recognition system based on a dual-layer data format, where the system includes: a first obtaining unit, configured to obtain a first format conversion instruction, perform format conversion on a first standard document according to the first format conversion instruction, and obtain a first conversion result, where the first conversion result includes first layer format data and second layer format data, and the first layer format data is different from the second layer format data; the second obtaining unit is used for obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result; a third obtaining unit, configured to obtain a first paragraph splitting instruction, and perform preliminary paragraph splitting on the second layer format data according to the first paragraph splitting instruction to obtain a first splitting result; a fourth obtaining unit, configured to obtain a first part-of-speech parsing instruction, and perform part-of-speech-based combined partitioning on the first partitioning result through the first part-of-speech parsing instruction to obtain a second partitioning result; a fifth obtaining unit, configured to obtain a first word frequency clustering instruction, and perform word frequency combination clustering on the first partition result through the first word frequency clustering instruction to obtain a third partition result; a sixth obtaining unit, configured to input the second division result, the third division result, and the first analysis result into a neural network-like data model, and obtain a first semantic recognition result.

In a third aspect, the present invention provides a semantic recognition system based on a dual-layer data format, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method according to the first aspect when executing the program.

One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:

because a first format conversion instruction is obtained, format conversion is carried out on a first standard document according to the first format conversion instruction to obtain a first conversion result, wherein the first conversion result comprises first layer format data and second layer format data, and the first layer format data is different from the second layer format data; obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result; obtaining a first paragraph dividing instruction, and performing preliminary paragraph division on the second-layer format data according to the first paragraph dividing instruction to obtain a first division result; obtaining a first part-of-speech analysis instruction, and performing part-of-speech-based combined division on the first division result through the first part-of-speech analysis instruction to obtain a second division result; obtaining a first word frequency clustering instruction, and performing word frequency combination clustering on the first division result through the first word frequency clustering instruction to obtain a third division result; and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain a first semantic identification result, and intelligently analyzing and checking the text through double-layer semantic analysis and division to achieve the technical effect of analyzing and checking the text in a double-layer format according to the text characteristics to obtain a more intelligent and accurate analysis result.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

FIG. 1 is a schematic flowchart of a semantic recognition method based on a two-layer data format according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a first division result obtained by a semantic recognition method based on a two-layer data format according to an embodiment of the present application;

FIG. 3 is a schematic flow chart illustrating a first semantic recognition result obtained by the semantic recognition method based on the two-layer data format according to the embodiment of the present application;

FIG. 4 is a schematic flowchart of verb analysis based on a semantic recognition method with a two-layer data format according to an embodiment of the present application;

FIG. 5 is a schematic flowchart illustrating further analysis of verbs in a semantic recognition method based on a two-layer data format according to an embodiment of the present disclosure;

FIG. 6 is a schematic flowchart of a standard document set processing of a semantic recognition method based on a two-layer data format according to an embodiment of the present disclosure;

FIG. 7 is a schematic flowchart of a model construction of a semantic recognition system based on a two-layer data format according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a semantic recognition system based on a two-layer data format according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an exemplary electronic device according to an embodiment of the present application.

Description of reference numerals: a first obtaining unit 11, a second obtaining unit 12, a third obtaining unit 13, a fourth obtaining unit 14, a fifth obtaining unit 15, a sixth obtaining unit 16, an electronic device 50, a processor 51, a memory 52, an input device 53, an output device 54.

Detailed Description

The embodiment of the application provides a semantic recognition method and a semantic recognition system based on a double-layer data format, solves the technical problem that in the prior art, in the text parsing process, intelligent parsing cannot be performed according to the characteristics of a text, and further parsing is not intelligent and accurate enough, and achieves the technical effect that according to the characteristics of the text, double-layer format parsing and verification are performed on the text, and a more intelligent and accurate parsing result is obtained. Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Summary of the application

In view of the above technical problems, the technical solution provided by the present application has the following general idea:

the embodiment of the application provides a semantic identification method based on a double-layer data format, wherein the method comprises the following steps: obtaining a first format conversion instruction, and performing format conversion on a first standard document according to the first format conversion instruction to obtain a first conversion result, wherein the first conversion result comprises first layer format data and second layer format data, and the first layer format data is different from the second layer format data; obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result; obtaining a first paragraph dividing instruction, and performing preliminary paragraph division on the second-layer format data according to the first paragraph dividing instruction to obtain a first division result; obtaining a first part-of-speech analysis instruction, and performing part-of-speech-based combined division on the first division result through the first part-of-speech analysis instruction to obtain a second division result; obtaining a first word frequency clustering instruction, and performing word frequency combination clustering on the first division result through the first word frequency clustering instruction to obtain a third division result; and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain a first semantic recognition result.

Having thus described the general principles of the present application, various non-limiting embodiments thereof will now be described in detail with reference to the accompanying drawings.

Example one

As shown in fig. 1, an embodiment of the present application provides a semantic identification method based on a dual-layer data format, where the method includes:

step S100: obtaining a first format conversion instruction, and performing format conversion on a first standard document according to the first format conversion instruction to obtain a first conversion result, wherein the first conversion result comprises first layer format data and second layer format data, and the first layer format data is different from the second layer format data;

specifically, the first format conversion instruction is an instruction for performing format conversion on a document to be analyzed, the first format conversion instruction at least controls the standard document to be converted into two formats, that is, the first layer format data and the second layer format data, and the first layer format data and the second layer format data are in different formats. And carrying out primary double-layer format conversion on the document to be processed through the first format conversion instruction, and laying a foundation for further text analysis in the follow-up process.

Step S200: obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result;

specifically, the first semantic parsing instruction is an instruction for parsing a text, semantic parsing is performed according to the first layer format data, the semantic parsing process is an intelligent reading process, a semantic dictionary corresponding to a formula, a picture and a text is constructed through parsed text information, preliminary semantic parsing is performed on the first layer format data based on the text content, and the first parsing result is obtained.

Step S300: obtaining a first paragraph dividing instruction, and performing preliminary paragraph division on the second-layer format data according to the first paragraph dividing instruction to obtain a first division result;

specifically, the first paragraph dividing instruction is an instruction for primarily classifying paragraphs of the second-layer format data, the second-layer format data is primarily divided based on paragraphs according to the characteristics of the paragraphs in the PDF text and the capturing result of the characteristics, the paragraph dividing further includes a process of dividing a formula, a picture, a label, an annotation, a title, and an index, and the first dividing result is obtained according to the dividing result. Through preliminary analysis of the second layer of format data, paragraph division of the second layer of format data is accurate, and a foundation is tamped for obtaining a more accurate semantic analysis result subsequently.

Step S400: obtaining a first part-of-speech analysis instruction, and performing part-of-speech-based combined division on the first division result through the first part-of-speech analysis instruction to obtain a second division result;

specifically, the analysis of the part of speech is a process of combining and clustering the characters according to combinable information of the characters and contents stated by the article. Firstly, part-of-speech determination is performed on characters appearing in an article, and generally speaking, the part-of-speech includes: nouns, verbs, adjectives, quantifiers, pronouns, adverbs, prepositions, conjunctions, sighs, adjectives, vocabularies, and the like. Further, after the combined division is performed according to the part-of-speech combination, a further division may be performed, that is, the second division result is obtained according to the role of the vocabulary in the paragraph, such as the subject part, the predicate verb part, the object part, the predicate part, the subject part, and the like. By dividing the second layer format data based on parts of speech, the subsequent analysis parts of speech of the article are clear, and the technical effect of more accurate analysis results can be obtained.

Step S500: obtaining a first word frequency clustering instruction, and performing word frequency combination clustering on the first division result through the first word frequency clustering instruction to obtain a third division result;

specifically, the first word frequency clustering instruction is an instruction for controlling high-frequency word combination of characters, that is, through big data retrieval, based on the combination of hot words, high-frequency words, various harmonic sounds, and word stems of pernicious words, the process of combining adjacent characters according to the hot words is performed by considering non-article contents, and through the first word frequency clustering instruction, word frequency combination clustering is performed on the first partitioning result to obtain the third partitioning result.

Step S600: and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain a first semantic recognition result.

Specifically, the neural network data model is a neural network model in machine learning, the meanings of the paragraph expressions are deeply analyzed by establishing a rich vocabulary resource library, comparing the semantics exhibited by the second division result and the third division result in combination with the substance expressed by the article, the final semantic analysis probability is obtained according to the comparison result, and the final semantic identification result, namely the first semantic identification result, is selected based on the probability. Through the selection of semantics under various combinations, the text paraphrase result and the paraphrase result in the PDF format are compared, and the final semantic analysis is carried out, so that the obtained semantic analysis result is more accurate, and further the technical effect of carrying out double-layer format analysis and verification on the text according to the text characteristics and obtaining a more intelligent and accurate analysis result is achieved.

Further, as shown in fig. 2, step S300 in the embodiment of the present application further includes:

step S310: obtaining a standard literature data set through big data, and carrying out format feature screening based on the standard literature data set to obtain a first subject format, a first title format and a first text format;

step S320: and carrying out format division on the second-layer format data through the first subject format, the first title format and the first text format to obtain the first division result.

Specifically, the standard document data set is a data set of a file of the same kind, the same author, or the same subject as the file to be analyzed, the data set is the second layer format data, data analysis is performed according to the second layer format data, format features are extracted from the standard document set, and further, after documents in the standard document data set are scaled to the same size, feature screening and extraction of a subject format, a title format, and a text format are performed. And obtaining the first text format, the first subject format and the first title format according to the screening and extracting results, zooming the second-layer format data to the same size proportion in the primary paragraph division process of the second-layer format data, and performing format division according to the characteristics of the first subject format, the first title format and the first text format to obtain the first division result.

Further, as shown in fig. 3, the step S600 of inputting the second division result, the third division result, and the first analysis result into a neural network data model to obtain a first semantic identification result further includes:

step S610: when the second division result is inconsistent with the third division result, obtaining a first semantic analysis instruction;

step S620: performing semantic analysis on the second division result and the third division result through the first semantic analysis instruction to obtain a first paraphrase result and a second paraphrase result;

step S630: when the first paraphrasing result and the second paraphrasing result are different, obtaining a first keyword and position information of the first keyword;

step S640: inputting the first keyword and the position information into a first position deviation analysis model to obtain a first identification result;

step S650: and obtaining the first semantic recognition result through the first identification result and the first analysis result.

Specifically, when the third division result is inconsistent with the second division result, which indicates that there is a possible article with an exception, a first semantic analysis instruction is obtained, the paragraph definitions under the second division result and the third division result are analyzed through the first semantic analysis instruction, so as to obtain the first definition result and the second definition result, and whether the first definition result and the second definition result are the same is determined. When the first paraphrasing result is different from the first paraphrasing result, the paragraph paraphrases under the second and third division results are different, at this time, keyword information causing the paraphrases to be divergent is obtained, position information of the keywords is obtained, a first deviation analysis model is input according to the part of speech combination causing the article to be different, the first position deviation analysis model is a model for performing deviation analysis on the positions and the combinations of the keywords according to the position information of the keywords and by combining paragraph ideas of the article, a first identification result is obtained on the basis of the deviation analysis model, and the first semantic identification result is obtained through the first identification result and the first analysis result. By analyzing the combination and the position of the keywords in a bias way, the position distribution analysis result of the vocabulary causing the foreign translation is more accurate, and further more accurate article paraphrases are obtained.

Further, as shown in fig. 4, step S630 in this embodiment of the present application further includes:

step S631: obtaining a first sentence according to the first keyword;

step S632: obtaining a first statement analysis instruction, and performing part-of-speech tagging on the first statement through the first statement analysis instruction to obtain a first part-of-speech tagging result;

step S633: obtaining a first judgment instruction, and judging whether a verb labeling result in the first part-of-speech labeling result is unique or not through the first judgment instruction;

step S634: when the verb labeling result is not unique, verifying the non-unique verb labeling result, and after the verb labeling result is verified to be correct, extracting the predetermined relation of the verb labeling result to obtain a first extraction result;

step S635: and obtaining the first semantic recognition result according to the first extraction result.

Specifically, when the first paraphrasing result and the second paraphrasing result have different meanings, a first sentence where the keyword is located is obtained according to the first keyword, the first sentence is subjected to deep analysis, verbs in the first sentence are judged firstly, that is, the number of the verbs in the first sentence is obtained, whether the number of the verbs in the first sentence is unique is judged, when the number of the verbs in the first sentence is not unique, the first sentence at least has 2 verbs, information of the verbs is verified, and when the information of the verbs is verified, the verb labeling result is extracted according to a predetermined relationship to obtain a first extraction result. Further, the process of extracting the predetermined relationship is a process of extracting a subject, a predicate and an object corresponding to the first sentence, that is, according to the labeled verb, obtaining a subject and an action object of the verb, and extracting a subject-predicate object for each verb to simplify analysis logic of the sentence and retain a main meaning of the paragraph, so as to obtain the first semantic recognition result. And simplifying and extracting ambiguous sentences through verb relation extraction to achieve the technical effect of obtaining the first semantic recognition result more accurately.

Further, as shown in fig. 5, the obtaining a first judgment instruction, and judging, by the first judgment instruction, whether a verb tagging result in the first part-of-speech tagging result is unique, step S633 in this embodiment of the present application further includes:

step S6331: when the verb labeling result is judged to be unique, extracting the verb labeling result in a predetermined relation to obtain a second extraction result;

step S6332: obtaining a first full-text comparison instruction, and performing bias check on the first keyword according to the first full-text comparison instruction to obtain a first check result;

step S6333: and substituting the position information and the first verification result into the second extraction result to obtain the first semantic recognition result.

Specifically, the step of determining whether the verb number is unique, when determining that the verb number is unique, this indicates that only one verb exists in the first sentence, and at this time, according to the labeling information of the verb, the predetermined relationship is extracted from the verb, so as to obtain the second extraction result, where the predetermined relationship extraction is still an extraction process of the subject and the action object of the verb. And obtaining the second extraction result according to the extraction result. And obtaining the first full-text comparison instruction, performing bias verification on the keywords generating the different translation through the first full-text comparison instruction, performing probability analysis on word senses of the first keywords according to full-text information, and obtaining the first verification result according to the probability. And substituting the position information of the keyword and the first verification result into a corresponding position in the second extraction result, and further analyzing the paraphrase of the first sentence to obtain the first semantic recognition result. Through the extraction of the predetermined relation of verbs, the bias verification and the combination of the positions of the keywords, the deep analysis of semantics is carried out, and further the technical effect that the obtained semantic analysis result is more accurate is achieved.

Further, as shown in fig. 6, the obtaining of the standard literature data set by the big data, and performing format feature screening based on the standard literature data set to obtain the first subject format, the first title format, and the first body format further includes, in step S310 of the embodiment of the present application:

step S311: obtaining a first zooming instruction, and carrying out standardized zooming on the standard literature data set according to the first zooming instruction to obtain a second standard literature data set;

step S312: and performing feature extraction through space position features and space quantity features to obtain a first space extraction result, wherein the first space extraction result comprises the first subject format, the first title format and the first text format.

Specifically, the process of screening the format features to obtain the first subject format, the first title format and the first body format further includes firstly scaling the standard document set under the same standard, obtaining a second standard document data set according to the normalized scaling result, and extracting features of positions of spaces and the number of the spaces in the second standard document set to obtain a first space extraction result. The position characteristic and the quantity characteristic of the blank space are used as characteristics for identifying the subject formatting, the title format and the text format.

Further, as shown in fig. 7, the step S600 of inputting the second division result, the third division result, and the first analysis result into a neural network data model to obtain a first semantic identification result further includes:

step S660: constructing the neural network data model, wherein the neural network data model is obtained by training a plurality of groups of training data, and each group of the plurality of groups of training data comprises: the second division result, the third division result, the first parsing result, and identification information identifying a semantic parsing result;

step S670: and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain the first semantic recognition result.

Specifically, the neural network data model is a neural network model in machine learning, can be continuously learned and adjusted, and is a highly complex nonlinear dynamical learning system. In brief, the semantic recognition method is a mathematical model, and the first semantic recognition result can be obtained by training the neural network-like data model to a convergence state through training of a large amount of training data and analyzing through the neural network-like data model based on input data.

Furthermore, the training process further includes a supervised learning process, each group of supervised data includes the second division result, the third division result, the first analysis result and identification information identifying the semantic analysis result, the second division result, the third division result and the first analysis result are input into the neural network model, the supervised learning is performed on the neural network-like data model according to the identification information identifying the semantic analysis result, so that output data of the neural network-like data model is consistent with the supervised data, continuous self-correction and adjustment are performed through the neural network model until the obtained output result is consistent with the identification information, the group of data supervised learning is ended, and the next group of data supervised learning is performed; and when the neural network model is in a convergence state, finishing the supervised learning process. Through supervised learning of the model, the model can process the input information more accurately, and a more accurate and reasonable semantic recognition result is obtained.

To sum up, the semantic recognition method and system based on the double-layer data format provided by the embodiment of the application have the following technical effects:

1. because a first format conversion instruction is obtained, format conversion is carried out on a first standard document according to the first format conversion instruction to obtain a first conversion result, wherein the first conversion result comprises first layer format data and second layer format data, and the first layer format data is different from the second layer format data; obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result; obtaining a first paragraph dividing instruction, and performing preliminary paragraph division on the second-layer format data according to the first paragraph dividing instruction to obtain a first division result; obtaining a first part-of-speech analysis instruction, and performing part-of-speech-based combined division on the first division result through the first part-of-speech analysis instruction to obtain a second division result; obtaining a first word frequency clustering instruction, and performing word frequency combination clustering on the first division result through the first word frequency clustering instruction to obtain a third division result; and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain a first semantic identification result, and intelligently analyzing and checking the text through double-layer semantic analysis and division to achieve the technical effect of analyzing and checking the text in a double-layer format according to the text characteristics to obtain a more intelligent and accurate analysis result.

2. Due to the fact that the method of analyzing the bias of the combination and the position of the keywords is adopted, the position distribution analysis result of the vocabulary causing the foreign translation is more accurate, and the technical effect of more accurate article paraphrasing is further achieved.

Example two

Based on the same inventive concept as the semantic recognition method based on the double-layer data format in the foregoing embodiment, the present invention further provides a semantic recognition system based on the double-layer data format, as shown in fig. 8, the system includes:

a first obtaining unit 11, where the first obtaining unit 11 is configured to obtain a first format conversion instruction, perform format conversion on a first standard document according to the first format conversion instruction, and obtain a first conversion result, where the first conversion result includes first layer format data and second layer format data, and the first layer format data is different from the second layer format data;

a second obtaining unit 12, where the second obtaining unit 12 is configured to obtain a first semantic parsing instruction, and perform data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result;

a third obtaining unit 13, where the third obtaining unit 13 is configured to obtain a first paragraph splitting instruction, and perform preliminary paragraph splitting on the second layer format data according to the first paragraph splitting instruction to obtain a first splitting result;

a fourth obtaining unit 14, where the fourth obtaining unit 14 is configured to obtain a first part-of-speech parsing instruction, and perform part-of-speech-based combined partitioning on the first partitioning result through the first part-of-speech parsing instruction to obtain a second partitioning result;

a fifth obtaining unit 15, where the fifth obtaining unit 15 is configured to obtain a first word frequency clustering instruction, and perform word frequency combination clustering on the first partition result through the first word frequency clustering instruction to obtain a third partition result;

a sixth obtaining unit 16, where the sixth obtaining unit 16 is configured to input the second division result, the third division result, and the first parsing result into a neural network data model to obtain a first semantic identification result.