Semantic recognition method and system based on double-layer data format

文档序号:169368 发布日期:2021-10-29 浏览:15次 中文

阅读说明:本技术 一种基于双层数据格式的语义识别方法及系统 (Semantic recognition method and system based on double-layer data format ) 是由 倪亚晖 武斌 赵锦春 林雪 于 2021-08-26 设计创作,主要内容包括:本发明公开了一种基于双层数据格式的语义识别方法及系统,根据第一格式转化指令将第一标准文献进行格式转化,获得第一转化结果,通过第一语语义解析指令将第一层格式数据进行数据解析,获得第一解析结果;将第二层格式数据进行初步段落划分,获得第一划分结果;通过第一词性解析指令对第一划分结果进行基于词性的组合划分,获得第二划分结果;通过第一词频聚类指令对第一划分结果进行词频组合聚类,获得第三划分结果;将第二划分结果、第三划分结果和第一解析结果输入类神经网络数据模型,获得第一语义识别结果。解决了现有技术中对文本解析的过程中存在不能根据文本的特性,进行智能化解析,进而导致解析不够智能准确的技术问题。(The invention discloses a semantic identification method and a semantic identification system based on a double-layer data format.A first standard document is subjected to format conversion according to a first format conversion instruction to obtain a first conversion result, and data analysis is carried out on first-layer format data through a first semantic analysis instruction to obtain a first analysis result; carrying out primary paragraph division on the second layer format data to obtain a first division result; performing part-of-speech-based combined division on the first division result through the first part-of-speech analysis instruction to obtain a second division result; performing word frequency combination clustering on the first division result through the first word frequency clustering instruction to obtain a third division result; and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain a first semantic recognition result. The method and the device solve the technical problem that in the prior art, intelligent analysis cannot be performed according to the characteristics of the text in the text analysis process, and further analysis is not intelligent and accurate enough.)

1. A semantic recognition method based on a two-layer data format, wherein the method comprises:

obtaining a first format conversion instruction, and performing format conversion on a first standard document according to the first format conversion instruction to obtain a first conversion result, wherein the first conversion result comprises first layer format data and second layer format data, and the first layer format data is different from the second layer format data;

obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result;

obtaining a first paragraph dividing instruction, and performing preliminary paragraph division on the second-layer format data according to the first paragraph dividing instruction to obtain a first division result;

obtaining a first part-of-speech analysis instruction, and performing part-of-speech-based combined division on the first division result through the first part-of-speech analysis instruction to obtain a second division result;

obtaining a first word frequency clustering instruction, and performing word frequency combination clustering on the first division result through the first word frequency clustering instruction to obtain a third division result;

and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain a first semantic recognition result.

2. The method of claim 1, wherein the method further comprises:

obtaining a standard literature data set through big data, and carrying out format feature screening based on the standard literature data set to obtain a first subject format, a first title format and a first text format;

and carrying out format division on the second-layer format data through the first subject format, the first title format and the first text format to obtain the first division result.

3. The method of claim 1, wherein the inputting the second partition result, the third partition result, and the first parsing result into a neural network-like data model to obtain a first semantic recognition result, further comprises:

when the second division result is inconsistent with the third division result, obtaining a first semantic analysis instruction;

performing semantic analysis on the second division result and the third division result through the first semantic analysis instruction to obtain a first paraphrase result and a second paraphrase result;

when the first paraphrasing result and the second paraphrasing result are different, obtaining a first keyword and position information of the first keyword;

inputting the first keyword and the position information into a first position deviation analysis model to obtain a first identification result;

and obtaining the first semantic recognition result through the first identification result and the first analysis result.

4. The method of claim 3, wherein the method further comprises:

obtaining a first sentence according to the first keyword;

obtaining a first statement analysis instruction, and performing part-of-speech tagging on the first statement through the first statement analysis instruction to obtain a first part-of-speech tagging result;

obtaining a first judgment instruction, and judging whether a verb labeling result in the first part-of-speech labeling result is unique or not through the first judgment instruction;

when the verb labeling result is not unique, verifying the non-unique verb labeling result, and after the verb labeling result is verified to be correct, extracting the predetermined relation of the verb labeling result to obtain a first extraction result;

and obtaining the first semantic recognition result according to the first extraction result.

5. The method of claim 4, wherein the obtaining a first judgment instruction, by which to judge whether a verb tagging result in the first part-of-speech tagging result is unique, further comprises:

when the verb labeling result is judged to be unique, extracting the verb labeling result in a predetermined relation to obtain a second extraction result;

obtaining a first full-text comparison instruction, and performing bias check on the first keyword according to the first full-text comparison instruction to obtain a first check result;

and substituting the position information and the first verification result into the second extraction result to obtain the first semantic recognition result.

6. The method of claim 2, wherein the obtaining of the standard literature data set through big data and the format feature screening based on the standard literature data set to obtain the first subject format, the first title format, and the first body format further comprises:

obtaining a first zooming instruction, and carrying out standardized zooming on the standard literature data set according to the first zooming instruction to obtain a second standard literature data set;

and performing feature extraction through space position features and space quantity features to obtain a first extraction result, wherein the first extraction result comprises the first subject format, the first title format and the first text format.

7. The method of claim 1, wherein the inputting the second partition result, the third partition result, and the first parsing result into a neural network-like data model to obtain a first semantic recognition result, further comprises:

constructing the neural network data model, wherein the neural network data model is obtained by training a plurality of groups of training data, and each group of the plurality of groups of training data comprises: the second division result, the third division result, the first parsing result, and identification information identifying a semantic parsing result;

and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain the first semantic recognition result.

8. A semantic recognition system based on a two-layer data format, wherein the system comprises:

a first obtaining unit, configured to obtain a first format conversion instruction, perform format conversion on a first standard document according to the first format conversion instruction, and obtain a first conversion result, where the first conversion result includes first layer format data and second layer format data, and the first layer format data is different from the second layer format data;

the second obtaining unit is used for obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result;

a third obtaining unit, configured to obtain a first paragraph splitting instruction, and perform preliminary paragraph splitting on the second layer format data according to the first paragraph splitting instruction to obtain a first splitting result;

a fourth obtaining unit, configured to obtain a first part-of-speech parsing instruction, and perform part-of-speech-based combined partitioning on the first partitioning result through the first part-of-speech parsing instruction to obtain a second partitioning result;

a fifth obtaining unit, configured to obtain a first word frequency clustering instruction, and perform word frequency combination clustering on the first partition result through the first word frequency clustering instruction to obtain a third partition result;

a sixth obtaining unit, configured to input the second division result, the third division result, and the first parsing result into a neural network-like data model, so as to obtain a first semantic recognition result.

9. A semantic recognition system based on a two-layer data format comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method of one of claims 1 to 7.

Technical Field

The invention relates to the field related to semantic analysis algorithms, in particular to a semantic identification method and a semantic identification system based on a double-layer data format.

Background

Semantic analysis (semantic analysis) is a branch of artificial intelligence, is a plurality of core tasks of natural language processing technology, relates to multiple subjects such as linguistics, computational linguistics, machine learning and cognitive languages, and is helpful for promoting the rapid development of other natural language processing tasks.

Semantic analysis refers to learning and understanding semantic contents represented by a section of text by using various methods, any language understanding can be classified into the semantic analysis category, the semantic analysis focuses on realizing automatic semantic analysis of each language unit by establishing an effective model and system, so that the real semantics of the whole text expression is understood, such as the semantics of words is obtained or distinguished, and the problems of synonyms and word ambiguity in text processing are solved.

However, in the process of implementing the technical solution of the invention in the embodiments of the present application, the inventors of the present application find that the above-mentioned technology has at least the following technical problems:

in the prior art, intelligent analysis cannot be performed according to the characteristics of a text in the text analysis process, so that the analysis is not intelligent and accurate enough.

Disclosure of Invention

The embodiment of the application provides a semantic recognition method and a semantic recognition system based on a double-layer data format, solves the technical problem that in the prior art, in the text parsing process, intelligent parsing cannot be performed according to the characteristics of a text, and further parsing is not intelligent and accurate enough, and achieves the technical effect that according to the characteristics of the text, double-layer format parsing and verification are performed on the text, and a more intelligent and accurate parsing result is obtained.

In view of the foregoing problems, the embodiments of the present application provide a semantic recognition method and system based on a dual-layer data format.

In a first aspect, the present application provides a semantic identification method based on a two-layer data format, where the method includes: obtaining a first format conversion instruction, and performing format conversion on a first standard document according to the first format conversion instruction to obtain a first conversion result, wherein the first conversion result comprises first layer format data and second layer format data, and the first layer format data is different from the second layer format data; obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result; obtaining a first paragraph dividing instruction, and performing preliminary paragraph division on the second-layer format data according to the first paragraph dividing instruction to obtain a first division result; obtaining a first part-of-speech analysis instruction, and performing part-of-speech-based combined division on the first division result through the first part-of-speech analysis instruction to obtain a second division result; obtaining a first word frequency clustering instruction, and performing word frequency combination clustering on the first division result through the first word frequency clustering instruction to obtain a third division result; and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain a first semantic recognition result.

In another aspect, the present application further provides a semantic recognition system based on a dual-layer data format, where the system includes: a first obtaining unit, configured to obtain a first format conversion instruction, perform format conversion on a first standard document according to the first format conversion instruction, and obtain a first conversion result, where the first conversion result includes first layer format data and second layer format data, and the first layer format data is different from the second layer format data; the second obtaining unit is used for obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result; a third obtaining unit, configured to obtain a first paragraph splitting instruction, and perform preliminary paragraph splitting on the second layer format data according to the first paragraph splitting instruction to obtain a first splitting result; a fourth obtaining unit, configured to obtain a first part-of-speech parsing instruction, and perform part-of-speech-based combined partitioning on the first partitioning result through the first part-of-speech parsing instruction to obtain a second partitioning result; a fifth obtaining unit, configured to obtain a first word frequency clustering instruction, and perform word frequency combination clustering on the first partition result through the first word frequency clustering instruction to obtain a third partition result; a sixth obtaining unit, configured to input the second division result, the third division result, and the first analysis result into a neural network-like data model, and obtain a first semantic recognition result.

In a third aspect, the present invention provides a semantic recognition system based on a dual-layer data format, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method according to the first aspect when executing the program.

One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:

because a first format conversion instruction is obtained, format conversion is carried out on a first standard document according to the first format conversion instruction to obtain a first conversion result, wherein the first conversion result comprises first layer format data and second layer format data, and the first layer format data is different from the second layer format data; obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result; obtaining a first paragraph dividing instruction, and performing preliminary paragraph division on the second-layer format data according to the first paragraph dividing instruction to obtain a first division result; obtaining a first part-of-speech analysis instruction, and performing part-of-speech-based combined division on the first division result through the first part-of-speech analysis instruction to obtain a second division result; obtaining a first word frequency clustering instruction, and performing word frequency combination clustering on the first division result through the first word frequency clustering instruction to obtain a third division result; and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain a first semantic identification result, and intelligently analyzing and checking the text through double-layer semantic analysis and division to achieve the technical effect of analyzing and checking the text in a double-layer format according to the text characteristics to obtain a more intelligent and accurate analysis result.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

FIG. 1 is a schematic flowchart of a semantic recognition method based on a two-layer data format according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a first division result obtained by a semantic recognition method based on a two-layer data format according to an embodiment of the present application;

FIG. 3 is a schematic flow chart illustrating a first semantic recognition result obtained by the semantic recognition method based on the two-layer data format according to the embodiment of the present application;

FIG. 4 is a schematic flowchart of verb analysis based on a semantic recognition method with a two-layer data format according to an embodiment of the present application;

FIG. 5 is a schematic flowchart illustrating further analysis of verbs in a semantic recognition method based on a two-layer data format according to an embodiment of the present disclosure;

FIG. 6 is a schematic flowchart of a standard document set processing of a semantic recognition method based on a two-layer data format according to an embodiment of the present disclosure;

FIG. 7 is a schematic flowchart of a model construction of a semantic recognition system based on a two-layer data format according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a semantic recognition system based on a two-layer data format according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an exemplary electronic device according to an embodiment of the present application.

Description of reference numerals: a first obtaining unit 11, a second obtaining unit 12, a third obtaining unit 13, a fourth obtaining unit 14, a fifth obtaining unit 15, a sixth obtaining unit 16, an electronic device 50, a processor 51, a memory 52, an input device 53, an output device 54.

Detailed Description

The embodiment of the application provides a semantic recognition method and a semantic recognition system based on a double-layer data format, solves the technical problem that in the prior art, in the text parsing process, intelligent parsing cannot be performed according to the characteristics of a text, and further parsing is not intelligent and accurate enough, and achieves the technical effect that according to the characteristics of the text, double-layer format parsing and verification are performed on the text, and a more intelligent and accurate parsing result is obtained. Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Summary of the application

Semantic analysis (semantic analysis) is a branch of artificial intelligence, is a plurality of core tasks of natural language processing technology, relates to multiple subjects such as linguistics, computational linguistics, machine learning and cognitive languages, and is helpful for promoting the rapid development of other natural language processing tasks.

Semantic analysis refers to learning and understanding semantic contents represented by a section of text by using various methods, any language understanding can be classified into the semantic analysis category, the semantic analysis focuses on realizing automatic semantic analysis of each language unit by establishing an effective model and system, so that the real semantics of the whole text expression is understood, such as the semantics of words is obtained or distinguished, and the problems of synonyms and word ambiguity in text processing are solved. In the prior art, intelligent analysis cannot be performed according to the characteristics of a text in the text analysis process, so that the analysis is not intelligent and accurate enough.

In view of the above technical problems, the technical solution provided by the present application has the following general idea:

the embodiment of the application provides a semantic identification method based on a double-layer data format, wherein the method comprises the following steps: obtaining a first format conversion instruction, and performing format conversion on a first standard document according to the first format conversion instruction to obtain a first conversion result, wherein the first conversion result comprises first layer format data and second layer format data, and the first layer format data is different from the second layer format data; obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result; obtaining a first paragraph dividing instruction, and performing preliminary paragraph division on the second-layer format data according to the first paragraph dividing instruction to obtain a first division result; obtaining a first part-of-speech analysis instruction, and performing part-of-speech-based combined division on the first division result through the first part-of-speech analysis instruction to obtain a second division result; obtaining a first word frequency clustering instruction, and performing word frequency combination clustering on the first division result through the first word frequency clustering instruction to obtain a third division result; and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain a first semantic recognition result.

Having thus described the general principles of the present application, various non-limiting embodiments thereof will now be described in detail with reference to the accompanying drawings.

Example one

As shown in fig. 1, an embodiment of the present application provides a semantic identification method based on a dual-layer data format, where the method includes:

step S100: obtaining a first format conversion instruction, and performing format conversion on a first standard document according to the first format conversion instruction to obtain a first conversion result, wherein the first conversion result comprises first layer format data and second layer format data, and the first layer format data is different from the second layer format data;

specifically, the first format conversion instruction is an instruction for performing format conversion on a document to be analyzed, the first format conversion instruction at least controls the standard document to be converted into two formats, that is, the first layer format data and the second layer format data, and the first layer format data and the second layer format data are in different formats. And carrying out primary double-layer format conversion on the document to be processed through the first format conversion instruction, and laying a foundation for further text analysis in the follow-up process.

Step S200: obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result;

specifically, the first semantic parsing instruction is an instruction for parsing a text, semantic parsing is performed according to the first layer format data, the semantic parsing process is an intelligent reading process, a semantic dictionary corresponding to a formula, a picture and a text is constructed through parsed text information, preliminary semantic parsing is performed on the first layer format data based on the text content, and the first parsing result is obtained.

Step S300: obtaining a first paragraph dividing instruction, and performing preliminary paragraph division on the second-layer format data according to the first paragraph dividing instruction to obtain a first division result;

specifically, the first paragraph dividing instruction is an instruction for primarily classifying paragraphs of the second-layer format data, the second-layer format data is primarily divided based on paragraphs according to the characteristics of the paragraphs in the PDF text and the capturing result of the characteristics, the paragraph dividing further includes a process of dividing a formula, a picture, a label, an annotation, a title, and an index, and the first dividing result is obtained according to the dividing result. Through preliminary analysis of the second layer of format data, paragraph division of the second layer of format data is accurate, and a foundation is tamped for obtaining a more accurate semantic analysis result subsequently.

Step S400: obtaining a first part-of-speech analysis instruction, and performing part-of-speech-based combined division on the first division result through the first part-of-speech analysis instruction to obtain a second division result;

specifically, the analysis of the part of speech is a process of combining and clustering the characters according to combinable information of the characters and contents stated by the article. Firstly, part-of-speech determination is performed on characters appearing in an article, and generally speaking, the part-of-speech includes: nouns, verbs, adjectives, quantifiers, pronouns, adverbs, prepositions, conjunctions, sighs, adjectives, vocabularies, and the like. Further, after the combined division is performed according to the part-of-speech combination, a further division may be performed, that is, the second division result is obtained according to the role of the vocabulary in the paragraph, such as the subject part, the predicate verb part, the object part, the predicate part, the subject part, and the like. By dividing the second layer format data based on parts of speech, the subsequent analysis parts of speech of the article are clear, and the technical effect of more accurate analysis results can be obtained.

Step S500: obtaining a first word frequency clustering instruction, and performing word frequency combination clustering on the first division result through the first word frequency clustering instruction to obtain a third division result;

specifically, the first word frequency clustering instruction is an instruction for controlling high-frequency word combination of characters, that is, through big data retrieval, based on the combination of hot words, high-frequency words, various harmonic sounds, and word stems of pernicious words, the process of combining adjacent characters according to the hot words is performed by considering non-article contents, and through the first word frequency clustering instruction, word frequency combination clustering is performed on the first partitioning result to obtain the third partitioning result.

Step S600: and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain a first semantic recognition result.

Specifically, the neural network data model is a neural network model in machine learning, the meanings of the paragraph expressions are deeply analyzed by establishing a rich vocabulary resource library, comparing the semantics exhibited by the second division result and the third division result in combination with the substance expressed by the article, the final semantic analysis probability is obtained according to the comparison result, and the final semantic identification result, namely the first semantic identification result, is selected based on the probability. Through the selection of semantics under various combinations, the text paraphrase result and the paraphrase result in the PDF format are compared, and the final semantic analysis is carried out, so that the obtained semantic analysis result is more accurate, and further the technical effect of carrying out double-layer format analysis and verification on the text according to the text characteristics and obtaining a more intelligent and accurate analysis result is achieved.

Further, as shown in fig. 2, step S300 in the embodiment of the present application further includes:

step S310: obtaining a standard literature data set through big data, and carrying out format feature screening based on the standard literature data set to obtain a first subject format, a first title format and a first text format;

step S320: and carrying out format division on the second-layer format data through the first subject format, the first title format and the first text format to obtain the first division result.

Specifically, the standard document data set is a data set of a file of the same kind, the same author, or the same subject as the file to be analyzed, the data set is the second layer format data, data analysis is performed according to the second layer format data, format features are extracted from the standard document set, and further, after documents in the standard document data set are scaled to the same size, feature screening and extraction of a subject format, a title format, and a text format are performed. And obtaining the first text format, the first subject format and the first title format according to the screening and extracting results, zooming the second-layer format data to the same size proportion in the primary paragraph division process of the second-layer format data, and performing format division according to the characteristics of the first subject format, the first title format and the first text format to obtain the first division result.

Further, as shown in fig. 3, the step S600 of inputting the second division result, the third division result, and the first analysis result into a neural network data model to obtain a first semantic identification result further includes:

step S610: when the second division result is inconsistent with the third division result, obtaining a first semantic analysis instruction;

step S620: performing semantic analysis on the second division result and the third division result through the first semantic analysis instruction to obtain a first paraphrase result and a second paraphrase result;

step S630: when the first paraphrasing result and the second paraphrasing result are different, obtaining a first keyword and position information of the first keyword;

step S640: inputting the first keyword and the position information into a first position deviation analysis model to obtain a first identification result;

step S650: and obtaining the first semantic recognition result through the first identification result and the first analysis result.

Specifically, when the third division result is inconsistent with the second division result, which indicates that there is a possible article with an exception, a first semantic analysis instruction is obtained, the paragraph definitions under the second division result and the third division result are analyzed through the first semantic analysis instruction, so as to obtain the first definition result and the second definition result, and whether the first definition result and the second definition result are the same is determined. When the first paraphrasing result is different from the first paraphrasing result, the paragraph paraphrases under the second and third division results are different, at this time, keyword information causing the paraphrases to be divergent is obtained, position information of the keywords is obtained, a first deviation analysis model is input according to the part of speech combination causing the article to be different, the first position deviation analysis model is a model for performing deviation analysis on the positions and the combinations of the keywords according to the position information of the keywords and by combining paragraph ideas of the article, a first identification result is obtained on the basis of the deviation analysis model, and the first semantic identification result is obtained through the first identification result and the first analysis result. By analyzing the combination and the position of the keywords in a bias way, the position distribution analysis result of the vocabulary causing the foreign translation is more accurate, and further more accurate article paraphrases are obtained.

Further, as shown in fig. 4, step S630 in this embodiment of the present application further includes:

step S631: obtaining a first sentence according to the first keyword;

step S632: obtaining a first statement analysis instruction, and performing part-of-speech tagging on the first statement through the first statement analysis instruction to obtain a first part-of-speech tagging result;

step S633: obtaining a first judgment instruction, and judging whether a verb labeling result in the first part-of-speech labeling result is unique or not through the first judgment instruction;

step S634: when the verb labeling result is not unique, verifying the non-unique verb labeling result, and after the verb labeling result is verified to be correct, extracting the predetermined relation of the verb labeling result to obtain a first extraction result;

step S635: and obtaining the first semantic recognition result according to the first extraction result.

Specifically, when the first paraphrasing result and the second paraphrasing result have different meanings, a first sentence where the keyword is located is obtained according to the first keyword, the first sentence is subjected to deep analysis, verbs in the first sentence are judged firstly, that is, the number of the verbs in the first sentence is obtained, whether the number of the verbs in the first sentence is unique is judged, when the number of the verbs in the first sentence is not unique, the first sentence at least has 2 verbs, information of the verbs is verified, and when the information of the verbs is verified, the verb labeling result is extracted according to a predetermined relationship to obtain a first extraction result. Further, the process of extracting the predetermined relationship is a process of extracting a subject, a predicate and an object corresponding to the first sentence, that is, according to the labeled verb, obtaining a subject and an action object of the verb, and extracting a subject-predicate object for each verb to simplify analysis logic of the sentence and retain a main meaning of the paragraph, so as to obtain the first semantic recognition result. And simplifying and extracting ambiguous sentences through verb relation extraction to achieve the technical effect of obtaining the first semantic recognition result more accurately.

Further, as shown in fig. 5, the obtaining a first judgment instruction, and judging, by the first judgment instruction, whether a verb tagging result in the first part-of-speech tagging result is unique, step S633 in this embodiment of the present application further includes:

step S6331: when the verb labeling result is judged to be unique, extracting the verb labeling result in a predetermined relation to obtain a second extraction result;

step S6332: obtaining a first full-text comparison instruction, and performing bias check on the first keyword according to the first full-text comparison instruction to obtain a first check result;

step S6333: and substituting the position information and the first verification result into the second extraction result to obtain the first semantic recognition result.

Specifically, the step of determining whether the verb number is unique, when determining that the verb number is unique, this indicates that only one verb exists in the first sentence, and at this time, according to the labeling information of the verb, the predetermined relationship is extracted from the verb, so as to obtain the second extraction result, where the predetermined relationship extraction is still an extraction process of the subject and the action object of the verb. And obtaining the second extraction result according to the extraction result. And obtaining the first full-text comparison instruction, performing bias verification on the keywords generating the different translation through the first full-text comparison instruction, performing probability analysis on word senses of the first keywords according to full-text information, and obtaining the first verification result according to the probability. And substituting the position information of the keyword and the first verification result into a corresponding position in the second extraction result, and further analyzing the paraphrase of the first sentence to obtain the first semantic recognition result. Through the extraction of the predetermined relation of verbs, the bias verification and the combination of the positions of the keywords, the deep analysis of semantics is carried out, and further the technical effect that the obtained semantic analysis result is more accurate is achieved.

Further, as shown in fig. 6, the obtaining of the standard literature data set by the big data, and performing format feature screening based on the standard literature data set to obtain the first subject format, the first title format, and the first body format further includes, in step S310 of the embodiment of the present application:

step S311: obtaining a first zooming instruction, and carrying out standardized zooming on the standard literature data set according to the first zooming instruction to obtain a second standard literature data set;

step S312: and performing feature extraction through space position features and space quantity features to obtain a first space extraction result, wherein the first space extraction result comprises the first subject format, the first title format and the first text format.

Specifically, the process of screening the format features to obtain the first subject format, the first title format and the first body format further includes firstly scaling the standard document set under the same standard, obtaining a second standard document data set according to the normalized scaling result, and extracting features of positions of spaces and the number of the spaces in the second standard document set to obtain a first space extraction result. The position characteristic and the quantity characteristic of the blank space are used as characteristics for identifying the subject formatting, the title format and the text format.

Further, as shown in fig. 7, the step S600 of inputting the second division result, the third division result, and the first analysis result into a neural network data model to obtain a first semantic identification result further includes:

step S660: constructing the neural network data model, wherein the neural network data model is obtained by training a plurality of groups of training data, and each group of the plurality of groups of training data comprises: the second division result, the third division result, the first parsing result, and identification information identifying a semantic parsing result;

step S670: and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain the first semantic recognition result.

Specifically, the neural network data model is a neural network model in machine learning, can be continuously learned and adjusted, and is a highly complex nonlinear dynamical learning system. In brief, the semantic recognition method is a mathematical model, and the first semantic recognition result can be obtained by training the neural network-like data model to a convergence state through training of a large amount of training data and analyzing through the neural network-like data model based on input data.

Furthermore, the training process further includes a supervised learning process, each group of supervised data includes the second division result, the third division result, the first analysis result and identification information identifying the semantic analysis result, the second division result, the third division result and the first analysis result are input into the neural network model, the supervised learning is performed on the neural network-like data model according to the identification information identifying the semantic analysis result, so that output data of the neural network-like data model is consistent with the supervised data, continuous self-correction and adjustment are performed through the neural network model until the obtained output result is consistent with the identification information, the group of data supervised learning is ended, and the next group of data supervised learning is performed; and when the neural network model is in a convergence state, finishing the supervised learning process. Through supervised learning of the model, the model can process the input information more accurately, and a more accurate and reasonable semantic recognition result is obtained.

To sum up, the semantic recognition method and system based on the double-layer data format provided by the embodiment of the application have the following technical effects:

1. because a first format conversion instruction is obtained, format conversion is carried out on a first standard document according to the first format conversion instruction to obtain a first conversion result, wherein the first conversion result comprises first layer format data and second layer format data, and the first layer format data is different from the second layer format data; obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result; obtaining a first paragraph dividing instruction, and performing preliminary paragraph division on the second-layer format data according to the first paragraph dividing instruction to obtain a first division result; obtaining a first part-of-speech analysis instruction, and performing part-of-speech-based combined division on the first division result through the first part-of-speech analysis instruction to obtain a second division result; obtaining a first word frequency clustering instruction, and performing word frequency combination clustering on the first division result through the first word frequency clustering instruction to obtain a third division result; and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain a first semantic identification result, and intelligently analyzing and checking the text through double-layer semantic analysis and division to achieve the technical effect of analyzing and checking the text in a double-layer format according to the text characteristics to obtain a more intelligent and accurate analysis result.

2. Due to the fact that the method of analyzing the bias of the combination and the position of the keywords is adopted, the position distribution analysis result of the vocabulary causing the foreign translation is more accurate, and the technical effect of more accurate article paraphrasing is further achieved.

Example two

Based on the same inventive concept as the semantic recognition method based on the double-layer data format in the foregoing embodiment, the present invention further provides a semantic recognition system based on the double-layer data format, as shown in fig. 8, the system includes:

a first obtaining unit 11, where the first obtaining unit 11 is configured to obtain a first format conversion instruction, perform format conversion on a first standard document according to the first format conversion instruction, and obtain a first conversion result, where the first conversion result includes first layer format data and second layer format data, and the first layer format data is different from the second layer format data;

a second obtaining unit 12, where the second obtaining unit 12 is configured to obtain a first semantic parsing instruction, and perform data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result;

a third obtaining unit 13, where the third obtaining unit 13 is configured to obtain a first paragraph splitting instruction, and perform preliminary paragraph splitting on the second layer format data according to the first paragraph splitting instruction to obtain a first splitting result;

a fourth obtaining unit 14, where the fourth obtaining unit 14 is configured to obtain a first part-of-speech parsing instruction, and perform part-of-speech-based combined partitioning on the first partitioning result through the first part-of-speech parsing instruction to obtain a second partitioning result;

a fifth obtaining unit 15, where the fifth obtaining unit 15 is configured to obtain a first word frequency clustering instruction, and perform word frequency combination clustering on the first partition result through the first word frequency clustering instruction to obtain a third partition result;

a sixth obtaining unit 16, where the sixth obtaining unit 16 is configured to input the second division result, the third division result, and the first parsing result into a neural network data model to obtain a first semantic identification result.

Further, the system further comprises:

a seventh obtaining unit, configured to obtain a standard literature data set through big data, and perform format feature screening based on the standard literature data set to obtain a first subject format, a first title format, and a first text format;

an eighth obtaining unit, configured to perform format division on the second layer format data according to the first subject format, the first title format, and the first body format, and obtain the first division result.

Further, the system further comprises:

a ninth obtaining unit configured to obtain a first semantic analysis instruction when the second division result and the third division result are inconsistent;

a tenth obtaining unit, configured to perform semantic analysis on the second and third division results through the first semantic analysis instruction to obtain a first paraphrase result and a second paraphrase result;

an eleventh obtaining unit configured to obtain a first keyword and position information of the first keyword when the first paraphrase result and the second paraphrase result are different;

a twelfth obtaining unit, configured to input the first keyword and the location information into a first location deviation analysis model, and obtain a first identification result;

a thirteenth obtaining unit, configured to obtain the first semantic recognition result through the first identification result and the first parsing result.

Further, the system further comprises:

a fourteenth obtaining unit, configured to obtain a first sentence according to the first keyword;

a fifteenth obtaining unit, configured to obtain a first sentence analysis instruction, perform part-of-speech tagging on the first sentence through the first sentence analysis instruction, and obtain a first part-of-speech tagging result;

a sixteenth obtaining unit, configured to obtain a first judgment instruction, and judge, by using the first judgment instruction, whether a verb tagging result in the first part-of-speech tagging result is unique;

a seventeenth obtaining unit, configured to verify the verb labeling result when the verb labeling result is not unique, and extract a predetermined relationship from the verb labeling result after the verification is correct to obtain a first extraction result;

an eighteenth obtaining unit, configured to obtain the first semantic identification result according to the first extraction result.

Further, the system further comprises:

a nineteenth obtaining unit, configured to, when the verb labeling result is judged to be unique, perform predetermined relationship extraction on the verb labeling result to obtain a second extraction result;

a twentieth obtaining unit, configured to obtain a first full-text comparison instruction, perform bias verification on the first keyword according to the first full-text comparison instruction, and obtain a first verification result;

a twenty-first obtaining unit, configured to substitute the location information and the first verification result into the second extraction result, and obtain the first semantic identification result.

Further, the system further comprises:

a twenty-second obtaining unit, configured to obtain a first scaling instruction, and perform normalized scaling on the standard literature data set according to the first scaling instruction to obtain a second standard literature data set;

a twenty-third obtaining unit, configured to perform feature extraction through a space position feature and a space number feature, and obtain a first extraction result, where the first extraction result includes the first subject format, the first title format, and the first body format.

Further, the system further comprises:

a first construction unit, configured to construct the neural network-like data model, where the neural network-like data model is obtained through training of multiple sets of training data, and each of the multiple sets of training data includes: the second division result, the third division result, the first analysis result and identification information identifying the semantic analysis result;

a twenty-fourth obtaining unit, configured to input the second division result, the third division result, and the first parsing result into a neural network-like data model, and obtain the first semantic recognition result.

Various changes and specific examples of the semantic recognition method based on the dual-layer data format in the first embodiment of fig. 1 are also applicable to the semantic recognition system based on the dual-layer data format in the present embodiment, and through the foregoing detailed description of the semantic recognition method based on the dual-layer data format, a person skilled in the art can clearly know the implementation method of the semantic recognition system based on the dual-layer data format in the present embodiment, so for the brevity of the description, detailed descriptions are not repeated here.

Exemplary electronic device

The electronic apparatus of the embodiment of the present application is described below with reference to fig. 9.

Fig. 9 illustrates a schematic structural diagram of an electronic device according to an embodiment of the present application.

Based on the inventive concept of a semantic recognition method based on a double-layer data format in the foregoing embodiments, the present invention further provides a semantic recognition system based on a double-layer data format, and an electronic device according to an embodiment of the present application is described below with reference to fig. 9. The electronic device may be a removable device itself or a stand-alone device independent thereof, on which a computer program is stored which, when being executed by a processor, carries out the steps of any of the methods as described hereinbefore.

As shown in fig. 9, the electronic device 50 includes one or more processors 51 and a memory 52.

The processor 51 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 50 to perform desired functions.

The memory 52 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 51 to implement the methods of the various embodiments of the application described above and/or other desired functions.

In one example, the electronic device 50 may further include: an input device 53 and an output device 54, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

The embodiment of the invention provides a semantic identification method based on a double-layer data format, wherein the method comprises the following steps: obtaining a first format conversion instruction, and performing format conversion on a first standard document according to the first format conversion instruction to obtain a first conversion result, wherein the first conversion result comprises first layer format data and second layer format data, and the first layer format data is different from the second layer format data; obtaining a first semantic parsing instruction, and performing data parsing on the first layer format data through the first semantic parsing instruction to obtain a first parsing result; obtaining a first paragraph dividing instruction, and performing preliminary paragraph division on the second-layer format data according to the first paragraph dividing instruction to obtain a first division result; obtaining a first part-of-speech analysis instruction, and performing part-of-speech-based combined division on the first division result through the first part-of-speech analysis instruction to obtain a second division result; obtaining a first word frequency clustering instruction, and performing word frequency combination clustering on the first division result through the first word frequency clustering instruction to obtain a third division result; and inputting the second division result, the third division result and the first analysis result into a neural network data model to obtain a first semantic recognition result. The technical problem that in the prior art, intelligent analysis cannot be carried out according to the characteristics of the text in the text analysis process, and further analysis is not intelligent and accurate enough is solved, and the technical effect that double-layer format analysis and verification are carried out on the text according to the characteristics of the text, and a more intelligent and accurate analysis result is obtained is achieved.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for causing a computer device to execute the method according to the embodiments of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted from a computer-readable storage medium to another computer-readable storage medium, which may be magnetic (e.g., floppy disks, hard disks, tapes), optical (e.g., DVDs), or semiconductor (e.g., Solid State Disks (SSDs)), among others.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Additionally, the terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

It should be understood that in the embodiment of the present application, "B corresponding to a" means that B is associated with a, from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于语义理解的知识图谱构建方法、检索方法及其系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!